====== Lab 04: Pandas & scikit-learn Foundations ======
In this lab you will practice a compact, reproducible workflow for working with **tabular data** and building **baseline ML models**:
* **Part 01 - Pandas:** reading/writing CSV, dtypes, indexing, sorting, grouping, correlations, simple feature engineering, handling missing values, and a tiny ML prep example.
* **Part 02 - scikit-learn:** training a baseline **RandomForest** on Iris (binary target), generating predictions and **classification reports**, computing **confusion matrices** and **TPR**, working with **probabilities** and **custom thresholds**, plotting the **ROC curve**, computing **AUROC**, and comparing to a **DummyClassifier** baseline.
===== Environment & Installation =====
Requirements:
* **Python 3.10+**, **pip**
* **JupyterLab**, **pandas**, **numpy**, **scikit-learn**, **matplotlib**
# Create and activate a virtual environment
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install jupyterlab pandas numpy scikit-learn matplotlib
# Launch JupyterLab
jupyter lab
Prefer a fresh venv per lab to avoid dependency conflicts. If your system Python is old, install a newer Python.
===== Getting the notebooks =====
Download the notebooks and place them in a working folder (e.g., ''labs/''). Then open them in JupyterLab.
* {{ :courses:becm33mle:labs:lab_scikit_jupiter_notebooks.zip |}}
===== Running tips =====
* Restart kernel if libraries were upgraded mid-session (Kernel → Restart).
* If plots don’t show, ensure you ran the cell importing matplotlib and the cell that generates the plot.
* If you see errors about missing packages, re-check that your venv is active (the shell prompt usually shows ''(.venv)'').
===== HW04: Progress report 1 =====
Provide a factual account of progress achieved so far and confirm the progress with your previously set milestones in the PRD document. The document should include:
* Name of the project and it's members
* Summary of what has been achieved so far
* Evaluate milestones, re-plan if outside of set scope
* Perform a risk evaluation of your project (i.e. what could go wrong?)
* Briefly reflect on your progress so far, state executive decisions leading the project forward
* Feel free to add: GUI concepts/screenshots, dataset sample, ...
Expected length of the document is **1 A4 page**. Submit your progress reports to [[https://cw.felk.cvut.cz/brute/|BRUTE]] as a ''.pdf'' file.