Search
In this assignment, your task will be to train a neural network with multi-loss objective, namely: hierarchical classification and semantic segmentation. The dataset consists of images of pets, where each image corresponds to a species (cat or dog) and a breed (25 dog breeds and 12 cat breeds). For each image there is also a semantic segmentation map with three classes: foreground, background & boundary. The task is to train a model that can determine the species p(species|image), the breed p(breed|image) and segmentation mask p(mask|image). hw03.zip
cat or dog
25 dog breeds
12 cat breeds
foreground, background & boundary
p(species|image)
p(breed|image)
p(mask|image)
UPDATE: make sure in your image and mask transforms you use transforms.Resize(128) and not transforms.Resize((128,128)) as was originally in the homework template!!
transforms.Resize(128)
transforms.Resize((128,128))
Task 1 - Species classification (1 point)
dog
cat
Task 2 - Breed classification (3 points)
'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier'
'Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx'
Task 3 - Semantic segmentation (6 points)
Submit a .zip file containing all your training & inference code. There needs to be a model.py file, containing a Net class which has a method predict.
model.py
Net
predict
There also needs to be a weights.pth file, which will be loaded in BRUTE with:
weights.pth
model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu')))
You can save the model with:
torch.save(model.state_dict(), "weights.pth")
The method takes a single 3 x 128 x 128 image as input. (processed with the same transform as in the template: Resize, CenterCrop, ToTensor, Normalize)
3 x 128 x 128
Resize
CenterCrop
ToTensor
Normalize
After computing the predictions, it outputs them in the following format:
128 x 128 tensor
0, 1 or 2
Accompanying the assignment, there will be a tournament in which the models will be ranked based on their performance. There will be a separate ranking for each task: species accuracy, top-3 breed accuracy and mean IoU. The final ranking will be determined by the sum of the ranks in all three tasks. The scoring will be based on ranking as follows:
Dropout
nn.BatchNorm2d
nn.BatchNorm1d
torch.flatten
128*16*16 = 32 768
32768×256
128x4x4
2048×256
nn.CrossEntropyLoss
weight
(#classes,)
nn.Dropout(prob)
Good luck, if you get stuck feel free to consult the web or various chatbots, just make sure to acquire true understanding in the process and not just copy stuff, in the case of any questions or concerns please contact siproman@fel.cvut.cz.