Lab 7: Synthetic Data Hackathon: ViT + Diffusion

In this final lab, you will work in small teams on a short open-ended project. The goal is to investigate whether synthetic images generated by a generative model can be used to train an image classifier that performs competitively with the same classifier trained on real images.

You are allowed to use LLMs, such as ChatGPT, Copilot, Claude, or similar tools, to help you brainstorm, write code, debug, and improve your solution. However, you are responsible for understanding and explaining your final choices.

Task

Your goal is to train a small image classifier on synthetic data and compare it with the same classifier trained on real data.

More specifically, you should:

Choose a small image classification dataset or use the suggested dataset below.
Select a small set of classes or use the suggested classes below.
Train a small network or classifier, for example a ViT-Tiny or an even smaller ViT-like model, on real images.
Generate synthetic images for the same classes using Stable Diffusion or another generative model that fits into the available GPU memory.
Train the same model on the synthetic images.
Evaluate both models on the same real test set.
Iterate and try to reduce the performance gap between training on synthetic data and training on real data.

The important comparison is:

Real train → real test
Synthetic train → real test

Use the same architecture, optimizer, training schedule, image resolution, and hyperparameters whenever possible.

Suggested dataset

We suggest using a small subset of the Oxford Flowers-102 dataset.

For example, you may use these 5 classes:

oxeye_daisy
common_dandelion
rose
sunflower
siam_tulip

A simple setting is:

50 real training images per class
50 synthetic training images per class
real validation/test images for evaluation

You may use another dataset if you prefer, but keep it small enough so that you can run several experiments during the lab.

Synthetic data generation

You may use any generative model that can run on the available GPU.

For each class, start from simple prompts such as:

a photo of a daisy
a photo of a sunflower
a close-up photo of a red rose

Then try to improve the synthetic data. For example, you may experiment with:

different prompt templates,
different backgrounds,
different viewpoints,
different lighting conditions,
data augmentation,
removing bad generated images,
generating more images per class,
mixing real and synthetic data.

The goal is not only to generate nice-looking images, but images that are useful for training a classifier.

Classifier

Use a small classifier that can be trained quickly. For example:

ViT-Tiny,
a smaller custom ViT,
a shallow CNN,
or another compact architecture.

If you use a ViT-like model, you may reduce the number of layers, embedding dimension, or number of attention heads to make training faster.

Try to keep the architecture fixed across the real-data and synthetic-data experiments.

Evaluation

Evaluate all models on the same real test set.

Report at least:

accuracy of the model trained on real data,
accuracy of the model trained on synthetic data,
the performance gap,
examples of generated synthetic images,
a short discussion of what improved or hurt performance.

Questions to think about

Are visually realistic synthetic images always useful for training?
What kinds of generated images help the classifier generalize to real images?
Does increasing the number of synthetic images always improve accuracy?
Are prompt diversity and visual diversity important?
Which mistakes does the synthetic-trained classifier make?

Submit

Submit a short report containing:

the dataset and classes you used,
the generative model and prompts you used,
representative generated images,
classifier architecture and training settings,
real-data accuracy,
synthetic-data accuracy,
your main observations.

Also submit your source code.

Table of Contents