Search
In this final lab, you will work in small teams on a short open-ended project. The goal is to investigate whether synthetic images generated by a generative model can be used to train an image classifier that performs competitively with the same classifier trained on real images.
You are allowed to use LLMs, such as ChatGPT, Copilot, Claude, or similar tools, to help you brainstorm, write code, debug, and improve your solution. However, you are responsible for understanding and explaining your final choices.
Your goal is to train a small image classifier on synthetic data and compare it with the same classifier trained on real data.
More specifically, you should:
The important comparison is:
Use the same architecture, optimizer, training schedule, image resolution, and hyperparameters whenever possible.
We suggest using a small subset of the Oxford Flowers-102 dataset.
For example, you may use these 5 classes:
A simple setting is:
You may use another dataset if you prefer, but keep it small enough so that you can run several experiments during the lab.
You may use any generative model that can run on the available GPU.
For each class, start from simple prompts such as:
a photo of a daisy
a photo of a sunflower
a close-up photo of a red rose
Then try to improve the synthetic data. For example, you may experiment with:
The goal is not only to generate nice-looking images, but images that are useful for training a classifier.
Use a small classifier that can be trained quickly. For example:
If you use a ViT-like model, you may reduce the number of layers, embedding dimension, or number of attention heads to make training faster.
Try to keep the architecture fixed across the real-data and synthetic-data experiments.
Evaluate all models on the same real test set.
Report at least:
Submit a short report containing:
Also submit your source code.