Lab 04: Pandas & scikit-learn Foundations

In this lab you will practice a compact, reproducible workflow for working with tabular data and building baseline ML models:

  • Part 01 - Pandas: reading/writing CSV, dtypes, indexing, sorting, grouping, correlations, simple feature engineering, handling missing values, and a tiny ML prep example.
  • Part 02 - scikit-learn: training a baseline RandomForest on Iris (binary target), generating predictions and classification reports, computing confusion matrices and TPR, working with probabilities and custom thresholds, plotting the ROC curve, computing AUROC, and comparing to a DummyClassifier baseline.

Environment & Installation

Requirements:

  • Python 3.10+, pip
  • JupyterLab, pandas, numpy, scikit-learn, matplotlib

# Create and activate a virtual environment
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
 
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install jupyterlab pandas numpy scikit-learn matplotlib
 
# Launch JupyterLab
jupyter lab

Prefer a fresh venv per lab to avoid dependency conflicts. If your system Python is old, install a newer Python.

Getting the notebooks

Download the notebooks and place them in a working folder (e.g., labs/). Then open them in JupyterLab.

Running tips

  • Restart kernel if libraries were upgraded mid-session (Kernel → Restart).
  • If plots don’t show, ensure you ran the cell importing matplotlib and the cell that generates the plot.
  • If you see errors about missing packages, re-check that your venv is active (the shell prompt usually shows (.venv)).

HW04: Progress report 1

Provide a factual account of progress achieved so far and confirm the progress with your previously set milestones in the PRD document. The document should include:

  • Name of the project and it's members
  • Summary of what has been achieved so far
  • Evaluate milestones, re-plan if outside of set scope
  • Perform a risk evaluation of your project (i.e. what could go wrong?)
  • Briefly reflect on your progress so far, state executive decisions leading the project forward
  • Feel free to add: GUI concepts/screenshots, dataset sample, …

Expected length of the document is 1 A4 page. Submit your progress reports to BRUTE as a .pdf file.

courses/becm33mle/tutorials/lab_scikit.txt · Last modified: 2025/10/21 11:02 by parildav