HW 02 - Image recognition

Your second homework will be Image recognition. For this task, we have created our own dataset, which is based on ImageNet.

The homework will be introduced in the labs in the 5th week, we will try to clear any doubts in Video.

Dataset

The dataset consists of 10 classes, 500 training images for each class, and 50 testing and 50 validation images for each class. Each image has resolution of 128×128 and has three color channels (R, G, B)

The dataset is available at taylor and cantor servers in directory /local/temporary/vir/hw02. It can also be downloaded in two formats. Either as a pickle file in which is a standard Python dict with three keys:

data - NumPy array containing image data as array of size [Nx128x128x3] and np.uint8 dtype, ranging from 0 to 255 where N is a number of files.
labels - NumPy array containing corresponding labels as array of size [N] and np.uint8 dtype, ranging from 0 to 9.
filenames - Python list (of len N) containing filenames of the corresponding images.

Second format is as raw images

Classes mapping is following:

0: bird
1: lizard
2: snake
3: spider
4: dog
5: cat
6: butterfly
7: monkey
8: fish
9: fruit

Your task

Design and train a neural network, achieving high accuracy on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset).

The GPU used in the evaluation is actually GPU with ID 7 on the taylor student server. Because of this, it is strictly forbidden to use this GPU during training, as it will negatively impact running time of evaluation for everyone!

Submission

Submit a Python module/package, that is importable by name hw_2 and has a function load_model(). Function load_model() needs to return an instance of torch.nn.Module (or a subclass) which is

capable of living either on cpu or GPU. Evaluation will be run on a GPU server.
capable of accepting a tensor of size [Bx3x128x128] (B is a batch size), dtype torch.float32, ranging from 0 to 1
returning a tensor of size [Bx10] with dtype torch.float32, living on the same device as input data

This is the only portion of your code, that will be automatically checked. However, in addition, submit also all other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it.

When using torch.load, always use map_location=“cpu”. It is the safest option, for the case, when your model will not be able to live on GPU.

# Get the weights and biases
model.state_dict()
# store it on harddisk
torch.save(model.state_dict(), 'weights.pts')
# load weights from harddisk and return to model
model = My_Net()
model.load_state_dict(torch.load('weights.pts', map_location="cpu"))
# ^ should print "<All keys matched successfully>"

Simplest submitted code (that won't achieve any points) can be along these lines. Name it hw_2.py, it should be submitted together with the model's weights stored through model.state_dict() in file named weights.pth.

import torch
import os
 
 
class Model(torch.nn.Module):
    '''This is my super cool, but super dumb module'''
    def __init__(self):
        super().__init__()
 
    def forward(self, x):
        batch_size = x.shape[0]
        return torch.rand(batch_size, 10, device=x.device)
 
def load_model():
    # This is the function to be filled. Your returned model needs to be an instance of subclass of torch.nn.Module
    # Model needs to be accepting tensors of shape [B, 3, 128, 128], where B is batch_size, which are in a range of [0-1] and type float32
    # It should be possible to pass in cuda tensors (in that case, model.cuda() will be called first).
    # The model will return scores (or probabilities) for each of the 10 classes, i.e a tensor of shape [B, 10]
    # The resulting tensor should have same device and dtype as incoming tensor
 
    directory = os.path.abspath(os.path.dirname(__file__))
 
    # The model should be trained in advance and in this function, you should instantiate model and load the weights into it:
    model = Model()
    model.load_state_dict(torch.load(directory + '/weights.pth', map_location='cpu'))
 
 
    # For more info on storing and loading weights, see https://pytorch.org/tutorials/beginner/saving_loading_models.html
    return model

In order to participate in the tournament part, your model also has to have non-empty docstring, briefly summarizing your model. We do not want you to spill your secrets, however, the main idea should be evident from the docstring. The only thing checked about docstring is whether it's there and nonempty. It will be visible to other students. Please, do not use diacritics in the docstring.

Points

The dataset is rather difficult, therefore in order to get any points, you only need top-1 accuracy of 40 %. In order to get a full amount of points for the individual part of the assignment, you need top-1 accuracy of 60 %. Anything in between will be linearly spaced. The maximum amount of points from the individual part of the assignment is 12. The equation is:

$$pts_{individual} = 12\times\text{clip}(\frac{acc - 0.4}{0.6 - 0.4}, 0, 1)$$

Deadline

Individual part: D + 21 days, 23:59 CEST, where D is a day of your tutorial where the homework is introduced (in 5th week)
Tournament part: 11.11.2021, 23:59 CET

Every 24 hours after the deadline, you will lose 1 points. However, you will not gain a negative number of points, so the minimum is 0.

Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments. We recommend allowing at least a full day for work on this homework.

Because there are two separate deadlines for this task, there are also two homeworks in BRUTE. You need to submit your work to both of them. The evaluation script is the same in both of them. However, in the tournament part, the BRUTE will always report 0 points for Automatic Evaluation and you will only gain points in the tournament. Do not be alarmed of this behavior, it is expected.

Additional training data

For the individual part, you may not use additional training data. For the tournament part, the use of additional data is allowed, however, you must follow couple rules:

The training of the network will be performed only by you and from scratch. That means, you may not use already pre-trained networks
The usage of additional training data will be mentioned in the docstring, so that it is clearly visible to everyone. (It also has to be within first 70 characters of the docstring, since we truncate longer docstrings)
The source of the additional data has to be specified in the comments in your code. This will be read only by the teachers. We want to know, what data you used to obtain such a great score, however, it is not necessary to put it in the docstring for everyone else to see.

Helpful resources

Code template

Code along these lines is used for evaluation in BRUTE. Feel free to use it.

#!/usr/bin/env python3
 
import argparse
import pickle
 
import numpy as np
import torch
import torch.utils.data as tdata
 
import hw_2
 
CLASSES = {
    0: 'bird',
    1: 'lizard',
    2: 'snake',
    3: 'spider',
    4: 'dog',
    5: 'cat',
    6: 'butterfly',
    7: 'monkey',
    8: 'fish',
    9: 'fruit',
}
 
 
class Dataset(tdata.Dataset):
    def __init__(self, pkl_name):
        self.pkl_name = pkl_name
        with open(self.pkl_name, 'rb') as f:
            loaded_data = pickle.load(f)
        self.labels = loaded_data['labels']
        self.data = loaded_data['data']
 
    def __getitem__(self, i):
        return {
            'labels': self.labels[i].astype(
                'i8'
            ),  # torch wants labels to be of type LongTensor, in order to compute losses
            'data': self.data[i].astype('f4').transpose((2, 0, 1)),
            # First retype to float32 (default dtype for torch)
            # then permute axes (torch expects data in CHW order) # Scale input data in your model's forward pass!!!
        }
 
    def __len__(self):
        return self.labels.shape[0]
 
 
def get_prediction_order(prediction, label):
    # prediction has shape [B, 10] (where B is batch size, 10 is number of classes)
    # label has shape [B]
 
    # both are torch tensors, prediction represents either score or probability of each class.
    # probability is torch.softmax(score, dim=1)
 
    # either way, the higher the value for each class, the more probable it is according to your model
    # therefore we can sort it according to given probability - and check on which place is the correct label.
 
    # ideally you want it to be at first place, but for example ImageNet is also evaluated on top-5 error
    # take 5 most confident predictions and only if your label is not in those best predictions, count it as error
 
    # Since ImageNet dataset has 1000 classes, if your predictions were random, top-5 error should be around 99.5 %
 
    prediction = prediction.detach()  # detach from computational graph (no grad)
    label = label.detach()
 
    prediction_sorted = torch.argsort(prediction, 1, True)
    finder = (
        label[:, None] == prediction_sorted
    )  # None as an index creates new dimension of size 1, so that broadcasting works as expected
    order = torch.nonzero(finder)[:, 1]  # returns a tensor of indices, where finder is True.
 
    return order
 
 
def create_confusion_matrix(num_classes, prediction, label):
    prediction = prediction.detach()
    label = label.detach()
    prediction = torch.argmax(prediction, 1)
    cm = torch.zeros(
        (num_classes, num_classes), dtype=torch.long, device=label.device
    )  # empty confusion matrix
    indices = torch.stack((label, prediction))  # stack labels and predictions
    new_indices, counts = torch.unique(
        indices, return_counts=True, dim=1
    )  # Find, how many cases are for each combination of (pred, label)
    cm[new_indices[0], new_indices[1]] += counts
 
    return cm
 
 
def print_stats(conf_matrix, orders):
    num_classes = conf_matrix.shape[0]
    print('Confusion matrix:')
    print(conf_matrix)
    print('\n---\n')
    print('Precision and recalls:')
    for c in range(num_classes):
        precision = conf_matrix[c, c] / conf_matrix[:, c].sum()
        recall = conf_matrix[c, c] / conf_matrix[c].sum()
        f1 = (2 * precision * recall) / (precision + recall)
        print(
            'Class {cls:10s} ({c}):\tPrecision: {prec:0.5f}\tRecall: {rec:0.5f}\tF1: {f1:0.5f}'.format(
                cls=CLASSES[c], c=c, prec=precision, rec=recall, f1=f1
            )
        )
 
    print('\n---\n')
    print('Top-n accuracy and error:')
    order_len = len(orders)
    for n in range(num_classes):
        topn = (orders <= n).sum()
        acc = topn / order_len
        err = 1 - acc
        print(
            'Top-{n}:\tAccuracy: {acc:0.5f}\tError: {err:0.5f}'.format(n=(n + 1), acc=acc, err=err)
        )
 
 
def evaluate(num_classes, dataset_file, batch_size=32, model=None):
    if model is None:
        model = hw_1.load_model()  # load model, your hw
    if torch.cuda.is_available():
        device = torch.device('cuda')
    else:
        device = torch.device('cpu')
    model = model.to(device)
    model = model.eval()  # switch to eval mode, so that some special layers behave nicely
    dataset = Dataset(dataset_file)
    loader = tdata.DataLoader(dataset, batch_size=batch_size)
 
    confusion_matrix = torch.zeros(
        (num_classes, num_classes), dtype=torch.long, device=device
    )  # empty confusion matrix
    orders = []
 
    with torch.no_grad():  # disable gradient computation
        for i, batch in enumerate(loader):
            data = batch['data'].to(device)
            labels = batch['labels'].to(device)
 
            prediction = model(data)
            confusion_matrix += create_confusion_matrix(num_classes, prediction, labels)
            order = get_prediction_order(prediction, labels).cpu().numpy()
            orders.append(order)
            print('Processed {i:02d}th batch'.format(i=(i + 1)))
 
    print('\n---\n')
    orders = np.concatenate(orders, 0)
    confusion_matrix = confusion_matrix.cpu().numpy()
 
    print_stats(confusion_matrix, orders)
    return (orders == 0).mean()  # Return top-1 accuracy
 
 
if __name__ == '__main__':
    parser = argparse.ArgumentParser('Evaluation demo for HW01')
    parser.add_argument('dataset', type=str)
    parser.add_argument('--batch_size', '-bs', default=32, type=int)
    parser.add_argument('--num_classes', '-nc', default=10, type=int)
 
    args = parser.parse_args()
    evaluate(args.num_classes, args.dataset, args.batch_size)

Table of Contents