This page is located in archive.

HW3: Image classification

Your task

In your third homework you will train image classifier from scratch, implement training and validation loop, dataset, data loader and couple of helper function. Points will be achieved based on an accuracy on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset).

The homework will be introduced in the labs in the 6th week.


We will be using a subset of popular and well-known ImageNet dataset available at taylor and cantor servers in directory /local/temporary/vir/hw03.

Training will be perform on one of two GPU servers available for the courses of the Depatment of Cybernetics.

  • Video “How to work with servers” from previous courses → link.
  • How to edit code on a remote server → link

The dataset consists of 50 classes, 1000 training, 50 testing and 50 validation images for each class. Each image has different resolution and has three color channels (R, G, B). Dataset has following structure:

├─── train 
│  ├ n01751748 (directory of first class contains 1000 jpeg images)
│  ├ ... 
│  └ n03018349 
└─── val 
   ├ n01751748 (directory of first class contains 50 jpeg images)
   ├ ...
   └ n03018349

Dataset contains following classes:

  • matchstick
  • band aid
  • horse cart
  • eel
  • backpack
  • pill bottle
  • dishwasher
  • black and gold garden spider
  • acoustic guitar
  • lumbermill
  • wall clock
  • partridge
  • scuba diver
  • cassette player
  • comic book
  • brambling
  • coffeepot
  • oboe
  • nail
  • crayfish
  • bathtub
  • corkscrew
  • boxer
  • sax
  • hand blower
  • web site
  • cannon
  • book jacket
  • ballplayer
  • vine snake
  • dowitcher
  • miniature poodle
  • feather boa
  • long-horned beetle
  • broccoli
  • spatula
  • washbasin
  • fountain pen
  • joystick
  • assault rifle
  • white stork
  • waffle iron
  • triumphal arch
  • carpenter's kit
  • green mamba
  • pickup
  • three-toed sloth
  • Old English sheepdog
  • tennis ball
  • dial telephone

Firstly, you will need to create dataset (It is an object, which contains training samples or can read/generate them on the fly and all necessary information). There are several classes for the most common use-cases in torchvision module or you can write you own dataset from scratch (ideally class with inheritance from torch.utils.data.Dataset). We will use ImageFolder class from torchvision. It assumes that data is already structured like our dataset 'datasetroot/class1/images“.

The dataset returns PIL.Image, thus it is important to add tfms.ToTensor() into dataset, which transforms PIL.Image (values range from 0 to 255) to tensor (type PyTorch FloatTensor of shape (C, H, W) with a range [0.0, 1.0]). The images are not captured by the same device and therefore we must add also tfms.Resize(), which resize our image to shape (256, 256).

We will also add tsfm.Normalize(), which helps get data within a range and reduces the skewness which helps learn faster and better. Normalization can also tackle the diminishing and exploding gradients problems. The input to normalization is mean and standard deviation, which needs to be calculated (pixel-wise) on training dataset. We provide calculated values, which will be also used in the evaluation.

import torchvision as tv
import torchvision.transforms as tfms
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
train_transform = tfms.Compose([tfms.Resize((256, 256)), tfms.ToTensor(), tfms.Normalize(mean, std)])
val_transform = tfms.Compose([tfms.Resize((256, 256)), tfms.ToTensor(), tfms.Normalize(mean, std)])
train_dataset = tv.datasets.ImageFolder(f'/local/temporary/vir/hw03/train', transform=train_transform)
val_dataset = tv.datasets.ImageFolder(f'/local/temporary/vir/hw03/val', transform=val_transform)

Secondly, we need to create dataloader (Data loader is an object, which takes samples from the dataset and generates the batch in efficient way). Class takes as input values batch_size (number of samples, which will be propagated through the model), shuffle (bool, to make random order) and num_workers (number of subprocess for loading the data, must be equal or below the num of CPU cores).

train_dataloader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)
val_dataloader = DataLoader(dataset=val_dataset, batch_size=1, shuffle=False)

For more information, visit link

Model and loops

The dataset and dataloader are already provided. Your main task is to write a model and train it.

class Model(torch.nn.Module):
    '''This is my super cool, but super dumb module'''
    def __init__(self):
    def forward(self, x):
        batch_size = x.shape[0]
        return torch.rand(batch_size, 50, device=x.device)

Core of the train and val loop is as following:

for input, labels in dataloader:
    output = model(input)
    loss = loss_fn(output, labels)
validate(model, val_loader, loss_fn)

  • write a function weight_init to fill the weights in model with proper random values.
  • you should turn on eval mode on model with model.eval (tells Dropout, BatchNorm and others to frozen their values) in validation
  • validation loop does not required gradients, therefore run validation loop with with torch.no_grad():
The GPU used in the evaluation is actually GPU with ID 7 on the taylor student server. Because of this, it is strictly forbidden to use this GPU during training, as it will negatively impact running time of evaluation for everyone!

Useful lines of codes for searching device on server:

def get_accuracy(prediction, labels_batch, dim=1):
    pred_index = prediction.argmax(dim)
    return (pred_index == labels_batch).float().mean()
def get_free_gpu():
    os.system('nvidia-smi -q -d Memory |grep -A5 GPU|grep Free >tmp')
    memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
    index = np.argmax(memory_available[:-1])  # Skip the 7th card --- it is reserved for evaluation!!!
    return int(index)
def get_device():  
    if torch.cuda.is_available():
        gpu = get_free_gpu()
        device = torch.device(gpu)
        device = 'cpu'
    return device


Submit a Python module/package, that is importable by name hw_3 and has a function load_model(). Function load_model() needs to return an instance of torch.nn.Module (or a subclass) which is

  • capable of living either on cpu or GPU. Evaluation will be run on a GPU server.
  • capable of accepting a tensor of size [Bx3x256x256] (B is a batch size), dtype torch.float32, normalised by torchvision.transforms.Normalize(mean, std), where mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
  • returning a tensor of size [Bx50] with dtype torch.float32, living on the same device as input data

This is the only portion of your code, that will be automatically checked. However, in addition, submit also all other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it.

When using torch.load, always use map_location=“cpu”. It is the safest option, for the case, when your model will not be able to live on GPU.

# Get the weights and biases
# store it on harddisk
torch.save(model.state_dict(), 'weights.pts')
# load weights from harddisk and return to model
model = My_Net()
model.load_state_dict(torch.load('weights.pts', map_location="cpu"))
# ^ should print "<All keys matched successfully>"

Simplest submitted code (that won't achieve any points) can be along these lines. Name it hw_3.py, it should be submitted together with the model's weights stored through model.state_dict() in file named weights.pth.

import torch
import os
class Model(torch.nn.Module):
    '''This is my super cool, but super dumb module'''
    def __init__(self):
    def forward(self, x):
        batch_size = x.shape[0]
        return torch.rand(batch_size, 50, device=x.device)
def load_model():
    # This is the function to be filled. Your returned model needs to be an instance of subclass of torch.nn.Module
    # Model needs to be accepting tensors of shape [B, 3, 256, 256], where B is batch_size, type float32
    # It should be possible to pass in cuda tensors (in that case, model.cuda() will be called first).
    # The model will return scores (or probabilities) for each of the 50 classes, i.e a tensor of shape [B, 50]
    # The resulting tensor should have same device and dtype as incoming tensor
    directory = os.path.abspath(os.path.dirname(__file__))
    # The model should be trained in advance and in this function, you should instantiate model and load the weights into it:
    model = Model()
    model.load_state_dict(torch.load(directory + '/weights.pth', map_location='cpu'))
    # For more info on storing and loading weights, see https://pytorch.org/tutorials/beginner/saving_loading_models.html
    return model


The dataset is rather difficult, therefore in order to get any points, you only need top-3 accuracy of 50 %. In order to get a full amount of points for the individual part of the assignment, you need top-3 accuracy of 75 %. Anything in between will be linearly spaced. The maximum amount of points from the assignment is 13. The equation is:

$$pts_{individual} = 13\times\text{clip}(\frac{top_3 acc - 0.5}{0.75 - 0.5}, 0, 1)$$

Numpy clip function link.


Every 24 hours after the deadline, you will lose 1/3 of the point. However, you will not gain a negative number of points, so the minimum is 0.

Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments.

Helpful resources

courses/b3b33vir/tutorials/lab_06.txt · Last modified: 2022/11/12 10:55 by pokorsi1