HW3: Image classification

Your task

In your third homework you will train image classifier from scratch, implement training and validation loop, dataset, data loader and couple of helper function. Points will be achieved based on an accuracy on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset).

The homework will be introduced in the labs in the 6th week.

Dataset

We will be using a subset of popular and well-known ImageNet dataset available at taylor and cantor servers in directory /local/temporary/vir/hw03.

Training will be perform on one of two GPU servers available for the courses of the Depatment of Cybernetics.

Video “How to work with servers” from previous courses → link.
How to edit code on a remote server → link

The dataset consists of 50 classes, 1000 training, 50 testing and 50 validation images for each class. Each image has different resolution and has three color channels (R, G, B). Dataset has following structure:

. 
├─── train 
│  ├ n01751748 (directory of first class contains 1000 jpeg images)
│  ├ ... 
│  └ n03018349 
└─── val 
   ├ n01751748 (directory of first class contains 50 jpeg images)
   ├ ...
   └ n03018349

Dataset contains following classes:

matchstick
band aid
horse cart
eel
backpack
pill bottle
dishwasher
black and gold garden spider
acoustic guitar
lumbermill
wall clock
partridge
scuba diver
cassette player
comic book
brambling
coffeepot
oboe
nail
crayfish
bathtub
corkscrew
boxer
sax
hand blower
web site
cannon
book jacket
ballplayer
vine snake
dowitcher
miniature poodle
feather boa
long-horned beetle
broccoli
spatula
washbasin
fountain pen
joystick
assault rifle
white stork
waffle iron
triumphal arch
carpenter's kit
green mamba
pickup
three-toed sloth
Old English sheepdog
tennis ball
dial telephone

Firstly, you will need to create dataset (It is an object, which contains training samples or can read/generate them on the fly and all necessary information). There are several classes for the most common use-cases in torchvision module or you can write you own dataset from scratch (ideally class with inheritance from torch.utils.data.Dataset). We will use ImageFolder class from torchvision. It assumes that data is already structured like our dataset 'datasetroot/class1/images“.

The dataset returns PIL.Image, thus it is important to add tfms.ToTensor() into dataset, which transforms PIL.Image (values range from 0 to 255) to tensor (type PyTorch FloatTensor of shape (C, H, W) with a range [0.0, 1.0]). The images are not captured by the same device and therefore we must add also tfms.Resize(), which resize our image to shape (256, 256).

We will also add tsfm.Normalize(), which helps get data within a range and reduces the skewness which helps learn faster and better. Normalization can also tackle the diminishing and exploding gradients problems. The input to normalization is mean and standard deviation, which needs to be calculated (pixel-wise) on training dataset. We provide calculated values, which will be also used in the evaluation.

import torchvision as tv
import torchvision.transforms as tfms
 
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
 
train_transform = tfms.Compose([tfms.Resize((256, 256)), tfms.ToTensor(), tfms.Normalize(mean, std)])
val_transform = tfms.Compose([tfms.Resize((256, 256)), tfms.ToTensor(), tfms.Normalize(mean, std)])
 
train_dataset = tv.datasets.ImageFolder(f'/local/temporary/vir/hw03/train', transform=train_transform)
val_dataset = tv.datasets.ImageFolder(f'/local/temporary/vir/hw03/val', transform=val_transform)

Secondly, we need to create dataloader (Data loader is an object, which takes samples from the dataset and generates the batch in efficient way). Class takes as input values batch_size (number of samples, which will be propagated through the model), shuffle (bool, to make random order) and num_workers (number of subprocess for loading the data, must be equal or below the num of CPU cores).

train_dataloader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)
val_dataloader = DataLoader(dataset=val_dataset, batch_size=1, shuffle=False)

For more information, visit link

Model and loops

The dataset and dataloader are already provided. Your main task is to write a model and train it.

class Model(torch.nn.Module):
    '''This is my super cool, but super dumb module'''
    def __init__(self):
        super().__init__()
 
    def forward(self, x):
        batch_size = x.shape[0]
        return torch.rand(batch_size, 50, device=x.device)

Core of the train and val loop is as following:

model.train()
for input, labels in dataloader:
    output = model(input)
    loss = loss_fn(output, labels)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
validate(model, val_loader, loss_fn)

write a function weight_init to fill the weights in model with proper random values.
you should turn on eval mode on model with model.eval (tells Dropout, BatchNorm and others to frozen their values) in validation
validation loop does not required gradients, therefore run validation loop with with torch.no_grad():

The GPU used in the evaluation is actually GPU with ID 7 on the taylor student server. Because of this, it is strictly forbidden to use this GPU during training, as it will negatively impact running time of evaluation for everyone!

Useful lines of codes for searching device on server:

 
def get_accuracy(prediction, labels_batch, dim=1):
    pred_index = prediction.argmax(dim)
    return (pred_index == labels_batch).float().mean()
 
 
def get_free_gpu():
    os.system('nvidia-smi -q -d Memory |grep -A5 GPU|grep Free >tmp')
    memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
    index = np.argmax(memory_available[:-1])  # Skip the 7th card --- it is reserved for evaluation!!!
    return int(index)
 
 
def get_device():  
    if torch.cuda.is_available():
        gpu = get_free_gpu()
        device = torch.device(gpu)
    else:
        device = 'cpu'
    return device

Submission

Submit a Python module/package, that is importable by name hw_3 and has a function load_model(). Function load_model() needs to return an instance of torch.nn.Module (or a subclass) which is

capable of living either on cpu or GPU. Evaluation will be run on a GPU server.
capable of accepting a tensor of size [Bx3x256x256] (B is a batch size), dtype torch.float32, normalised by torchvision.transforms.Normalize(mean, std), where mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
returning a tensor of size [Bx50] with dtype torch.float32, living on the same device as input data

This is the only portion of your code, that will be automatically checked. However, in addition, submit also all other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it.

When using torch.load, always use map_location=“cpu”. It is the safest option, for the case, when your model will not be able to live on GPU.

# Get the weights and biases
model.state_dict()
# store it on harddisk
torch.save(model.state_dict(), 'weights.pts')
# load weights from harddisk and return to model
model = My_Net()
model.load_state_dict(torch.load('weights.pts', map_location="cpu"))
# ^ should print "<All keys matched successfully>"

Simplest submitted code (that won't achieve any points) can be along these lines. Name it hw_3.py, it should be submitted together with the model's weights stored through model.state_dict() in file named weights.pth.

import torch
import os
 
class Model(torch.nn.Module):
    '''This is my super cool, but super dumb module'''
    def __init__(self):
        super().__init__()
 
    def forward(self, x):
        batch_size = x.shape[0]
        return torch.rand(batch_size, 50, device=x.device)
 
 
def load_model():
    # This is the function to be filled. Your returned model needs to be an instance of subclass of torch.nn.Module
    # Model needs to be accepting tensors of shape [B, 3, 256, 256], where B is batch_size, type float32
    # It should be possible to pass in cuda tensors (in that case, model.cuda() will be called first).
    # The model will return scores (or probabilities) for each of the 50 classes, i.e a tensor of shape [B, 50]
    # The resulting tensor should have same device and dtype as incoming tensor
 
 
    directory = os.path.abspath(os.path.dirname(__file__))
 
    # The model should be trained in advance and in this function, you should instantiate model and load the weights into it:
    model = Model()
    model.load_state_dict(torch.load(directory + '/weights.pth', map_location='cpu'))
 
 
    # For more info on storing and loading weights, see https://pytorch.org/tutorials/beginner/saving_loading_models.html
    return model

Points

The dataset is rather difficult, therefore in order to get any points, you only need top-3 accuracy of 50 %. In order to get a full amount of points for the individual part of the assignment, you need top-3 accuracy of 75 %. Anything in between will be linearly spaced. The maximum amount of points from the assignment is 13. The equation is:

$$pts_{individual} = 13\times\text{clip}(\frac{top_3 acc - 0.5}{0.75 - 0.5}, 0, 1)$$

Numpy clip function link.

Deadline

Every 24 hours after the deadline, you will lose 1/3 of the point. However, you will not gain a negative number of points, so the minimum is 0.

Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments.

Table of Contents