====== Lab 04: Pandas & scikit-learn Foundations ======

In this lab you will practice a compact, reproducible workflow for working with **tabular data** and building **baseline ML models**:

  * **Part 01 - Pandas:** reading/writing CSV, dtypes, indexing, sorting, grouping, correlations, simple feature engineering, handling missing values, and a tiny ML prep example.
  * **Part 02 - scikit-learn:** training a baseline **RandomForest** on Iris (binary target), generating predictions and **classification reports**, computing **confusion matrices** and **TPR**, working with **probabilities** and **custom thresholds**, plotting the **ROC curve**, computing **AUROC**, and comparing to a **DummyClassifier** baseline.


===== Environment & Installation =====

Requirements:

  * **Python 3.10+**, **pip**
  * **JupyterLab**, **pandas**, **numpy**, **scikit-learn**, **matplotlib**

<code bash>
# Create and activate a virtual environment
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1

# Upgrade pip and install dependencies
pip install --upgrade pip
pip install jupyterlab pandas numpy scikit-learn matplotlib

# Launch JupyterLab
jupyter lab
</code>

<note tip>Prefer a fresh venv per lab to avoid dependency conflicts. If your system Python is old, install a newer Python.</note>

===== Getting the notebooks =====

Download the notebooks and place them in a working folder (e.g., ''labs/''). Then open them in JupyterLab.

  * {{ :courses:becm33mle:labs:lab_scikit_jupiter_notebooks.zip |}}

===== Running tips =====

  * Restart kernel if libraries were upgraded mid-session (Kernel → Restart).
  * If plots don’t show, ensure you ran the cell importing matplotlib and the cell that generates the plot.
  * If you see errors about missing packages, re-check that your venv is active (the shell prompt usually shows ''(.venv)'').


===== HW04: Progress report 1 =====

Provide a factual account of progress achieved so far and confirm the progress with your previously set milestones in the PRD document. The document should include:
  * Name of the project and it's members
  * Summary of what has been achieved so far
  * Evaluate milestones, re-plan if outside of set scope
  * Perform a risk evaluation of your project (i.e. what could go wrong?)
  * Briefly reflect on your progress so far, state executive decisions leading the project forward
  * Feel free to add: GUI concepts/screenshots, dataset sample, ...
Expected length of the document is **1 A4 page**. Submit your progress reports to [[https://cw.felk.cvut.cz/brute/|BRUTE]] as a ''.pdf'' file.