Lab 04: Pandas & scikit-learn Foundations

Lab 04: Pandas & scikit-learn Foundations

In this lab you will practice a compact, reproducible workflow for working with tabular data and building baseline ML models:

Part 01 - Pandas: reading/writing CSV, dtypes, indexing, sorting, grouping, correlations, simple feature engineering, handling missing values, and a tiny ML prep example.
Part 02 - scikit-learn: training a baseline RandomForest on Iris (binary target), generating predictions and classification reports, computing confusion matrices and TPR, working with probabilities and custom thresholds, plotting the ROC curve, computing AUROC, and comparing to a DummyClassifier baseline.

Environment & Installation

Requirements:

Python 3.10+, pip
JupyterLab, pandas, numpy, scikit-learn, matplotlib

# Create and activate a virtual environment
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
 
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install jupyterlab pandas numpy scikit-learn matplotlib
 
# Launch JupyterLab
jupyter lab

Prefer a fresh venv per lab to avoid dependency conflicts. If your system Python is old, install a newer Python.

Getting the notebooks

Download the notebooks and place them in a working folder (e.g., labs/). Then open them in JupyterLab.

lab_scikit_jupiter_notebooks.zip

Running tips

Restart kernel if libraries were upgraded mid-session (Kernel → Restart).
If plots don’t show, ensure you ran the cell importing matplotlib and the cell that generates the plot.
If you see errors about missing packages, re-check that your venv is active (the shell prompt usually shows (.venv)).

HW04: Progress report 1

Provide a factual account of progress achieved so far and confirm the progress with your previously set milestones in the PRD document. The document should include:

Name of the project and it's members
Summary of what has been achieved so far
Evaluate milestones, re-plan if outside of set scope
Perform a risk evaluation of your project (i.e. what could go wrong?)
Briefly reflect on your progress so far, state executive decisions leading the project forward
Feel free to add: GUI concepts/screenshots, dataset sample, …

Expected length of the document is 1 A4 page. Submit your progress reports to BRUTE as a .pdf file.

Table of Contents

Lab 04: Pandas & scikit-learn Foundations

Environment & Installation

Getting the notebooks

Running tips

HW04: Progress report 1