===== HW 3 - CNNs ===== All information regarding the third homework assignment can be found here: [[https://github.com/urob-ctu/hw3-cnns|UROB HW3 – Fruit Image Analysis Repository]] ---- ==== 🍎 Overview ==== This homework focuses on training a **Convolutional Neural Network (CNN)** to: * 🏷️ **Classify** images of fruits into 30 categories * 🎨 **Segment** fruits using pixel-level masks * 🔍 **Learn** meaningful image embeddings The goal is to design, train, and evaluate a model capable of performing all three tasks efficiently. ==== ✅ Tasks ==== * Implement your CNN model in `model.py` as class `MyModel` * The `forward()` method must return: - Class logits `[batch_size, 30]` - Segmentation mask `[batch_size, 1, 64, 64]` - Image embeddings `[batch_size, embedding_dim]` * Implement `get_embedding()` returning embeddings only * Complete the training loop in `train.py` * Tune hyperparameters in `confs/config.yml` ==== 📊 Evaluation ==== **Basic Evaluation (10 points)** | Task | Metric | Threshold | Points | |------|---------|------------|--------| | 🍊 Classification | Accuracy | 80% → 1 pt | 85% → 2 pts | | 🎨 Segmentation | Mean IoU | 75% → 1 pt | 80% → 2 pts | 85% → 3 pts | | 📈 Embeddings (ROC) | AUC | 0.80 → 1 pt | 0.85 → 2 pts | | 🎯 Embeddings (TPR) | TPR @ 5% FPR | 0.75 → 1 pt | 0.80 → 2 pts | 0.85 → 3 pts | **Tournament Evaluation (up to +3 bonus points)** Models are ranked by performance across all tasks. Lower total rank = better score. Ties are broken by earlier submission. * 🥇 1st-3rd place → +3 pts * 🥈 4th-6th place → +2 pts * 🥉 7th-9th place → +1 pts Maximum total = **13 points** ---- ==== 📦 Submission ==== Submit a ZIP file containing: * `model.py` – your model * `train.py` – your training script * `weights.pth` – trained model weights The size limit for a .zip file is 500 MB. ==== 🚫 Important Policies ==== **Pretrained Models Policy** ❌ No pretrained models or transfer learning allowed. ✅ Train from scratch using only the provided dataset. **Plagiarism Policy** ✅ You may discuss ideas with classmates. ❌ Direct code copying = 0 points. ---- ==== Hints ==== Keep the data loader similar to lab06. The images for the evaluation will be loaded and preprocessed in the same way. Feel free to add data augmentation for better results. Do not create giant networks for two reasons: *It is not needed for this task :) *The upload system allows you to upload a .zip file up to 500 MB. If your weights are larger than 500 MB, they cannot be evaluated. Don't be shy using the compute. The cluster is for you; use it as much as possible to test your ideas. ==== Common Architectures for reference ==== **Classification** [[https://en.wikipedia.org/wiki/AlexNet|AlexNet]] [[https://en.wikipedia.org/wiki/VGGNet|VGGNet]] [[https://en.wikipedia.org/wiki/Residual_neural_network|ResNet]] **Segmentation** [[https://arxiv.org/pdf/1411.4038|FCN]] [[https://en.wikipedia.org/wiki/U-Net|U-Net]] [[https://arxiv.org/pdf/1703.06870|Mask-RCNN]] **Good luck! 🍀** For questions, contact: jan.skvrna@cvut.cz