====== Lab 04: Pandas & scikit-learn Foundations ====== In this lab you will practice a compact, reproducible workflow for working with **tabular data** and building **baseline ML models**: * **Part 01 - Pandas:** reading/writing CSV, dtypes, indexing, sorting, grouping, correlations, simple feature engineering, handling missing values, and a tiny ML prep example. * **Part 02 - scikit-learn:** training a baseline **RandomForest** on Iris (binary target), generating predictions and **classification reports**, computing **confusion matrices** and **TPR**, working with **probabilities** and **custom thresholds**, plotting the **ROC curve**, computing **AUROC**, and comparing to a **DummyClassifier** baseline. ===== Environment & Installation ===== Requirements: * **Python 3.10+**, **pip** * **JupyterLab**, **pandas**, **numpy**, **scikit-learn**, **matplotlib** # Create and activate a virtual environment python -m venv .venv # macOS/Linux source .venv/bin/activate # Windows (PowerShell) .\.venv\Scripts\Activate.ps1 # Upgrade pip and install dependencies pip install --upgrade pip pip install jupyterlab pandas numpy scikit-learn matplotlib # Launch JupyterLab jupyter lab Prefer a fresh venv per lab to avoid dependency conflicts. If your system Python is old, install a newer Python. ===== Getting the notebooks ===== Download the notebooks and place them in a working folder (e.g., ''labs/''). Then open them in JupyterLab. * {{ :courses:becm33mle:labs:lab_scikit_jupiter_notebooks.zip |}} ===== Running tips ===== * Restart kernel if libraries were upgraded mid-session (Kernel → Restart). * If plots don’t show, ensure you ran the cell importing matplotlib and the cell that generates the plot. * If you see errors about missing packages, re-check that your venv is active (the shell prompt usually shows ''(.venv)''). ===== HW04: Progress report 1 ===== Provide a factual account of progress achieved so far and confirm the progress with your previously set milestones in the PRD document. The document should include: * Name of the project and it's members * Summary of what has been achieved so far * Evaluate milestones, re-plan if outside of set scope * Perform a risk evaluation of your project (i.e. what could go wrong?) * Briefly reflect on your progress so far, state executive decisions leading the project forward * Feel free to add: GUI concepts/screenshots, dataset sample, ... Expected length of the document is **1 A4 page**. Submit your progress reports to [[https://cw.felk.cvut.cz/brute/|BRUTE]] as a ''.pdf'' file.