Your first homework will be Image recognition. For this task, we have created our own dataset, which is based on ImageNet.
The homework will be introduced in the labs in the 3rd week, we will try to clear any doubts.
The dataset consists of 10 classes, 500 training images for each class, and 50 testing and 50 validation images for each class. Each image has resolution of 128×128 and has three color channels (R, G, B)
The dataset is available at taylor
and cantor
servers in directory /local/temporary/vir/hw01
. It can also be downloaded in two formats. Either as a pickle file in which is a standard Python dict with three keys:
data
- NumPy array containing image data as array of size [Nx128x128x3]
and np.uint8
dtype, ranging from 0 to 255 where N
is a number of files.
labels
- NumPy array containing corresponding labels as array of size [N]
and np.uint8
dtype, ranging from 0 to 9.
filenames
- Python list (of len N
) containing filenames of the corresponding images.
Second format is as raw images
Classes mapping is following:
Design and train a neural network, achieving high accuracy on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset).
Submit a Python module/package, that is importable by name hw_1
and has a function load_model()
. Function load_model()
needs to return an instance of torch.nn.Module
(or a subclass) which is
[Bx3x128x128]
(B
is a batch size), dtype torch.float32
, ranging from 0 to 1
[Bx10]
with dtype torch.float32
, living on the same device as input data
This is the only portion of your code, that will be automatically checked. However, in addition, submit also all other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it
torch.load
, always use map_location=“cpu”
. It is the safest option, for the case, when your model will not be able to live on GPU.
In order to participate in the tournament part, your model also has to have non-empty docstring, briefly summarizing your model. We do not want you to spill your secrets, however, the main idea should be evident from the docstring. The only thing checked about docstring is whether it's there and nonempty. It will be visible to other students. Please, do not use diacritics in the docstring
The dataset is rather difficult, therefore in order to get any points, you only need top-1 accuracy of 40 %. In order to get a full amount of points for the individual part of the assignment, you need top-1 accuracy of 55 %. Anything in between will be linearly spaced. The maximum amount of points from the individual part of the assignment is 8. The equation is:
$$pts_{individual} = 8\times\text{clip}(\frac{acc - 0.4}{0.55 - 0.4}, 0, 1)$$
Any submission with top-1 accuracy over 55 % is eligible for tournament part of the assignment. In the tournament part, maximum achievable points is 4 and anybody, who has a top-1 accuracy over 55 % gets some points. The precise equation for calculating the points is $$ c = \begin{cases} \text{clip}(\frac{acc - 0.55}{\max(acc) - 0.55}, 0, 1)& \text{if}\ \max(acc) > 0.55\\ 1 & \text{if}\ \max(acc) = 0.55 \land \max(acc) = acc\\ 0 & \text{otherwise} \end{cases} $$ $$pts_{tournament} = 4\times \sqrt{c(2 - c)}$$
Every 24 hours after the deadline, you will lose 1 points. However, you will not gain a negative number of points, so the minimum is 0.
Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments. We recommend allowing at least a full day for work on this homework.
Because there are two separate deadlines for this task, there are also two homeworks in BRUTE. You need to submit your work to both of them. The evaluation script is the same in both of them. However, in the tournament part, the BRUTE will always report 0 points for Automatic Evaluation and you will only gain points in the tournament. Do not be alarmed of this behavior, it is expected.
For the individual part, you may not use additional training data. For the tournament part, the use of additional data is allowed, however, you must follow couple rules:
Simplest submitted code (that won't achieve any points) can be along these lines. We strongly recommend submitting substantially more work
import torch class Model(torch.nn.Module): '''This is my super cool, but super dumb module''' def __init__(self): super().__init__() def forward(self, x): batch_size = x.shape[0] return torch.rand(batch_size, 10, device=x.device) def load_model(): # This is the function to be filled. Your returned model needs to be an instance of subclass of torch.nn.Module # We strongly recommend to have your model more complicated than this. # Model needs to be accepting tensors of shape [B, 3, 128, 128], where B is batch_size, which are in a range of [0-1] and type float32 # It should be possible to pass in cuda tensors (in that case, model.cuda() will be called first). # The model will return scores (or probabilities) for each of the 10 classes, i.e a tensor of shape [B, 10] # The resulting tensor should have same device and dtype as incoming tensor # The model should be trained in advance and in this function, you should instantiate model and load the weights into it. # For more info on storing and loading weights, see https://pytorch.org/tutorials/beginner/saving_loading_models.html return Model()
Code along these lines is used for evaluation in BRUTE. Feel free to use it.
#!/usr/bin/env python3 import argparse import pickle import numpy as np import torch import torch.utils.data as tdata import hw_1 CLASSES = { 0: 'bird', 1: 'lizard', 2: 'snake', 3: 'spider', 4: 'dog', 5: 'cat', 6: 'butterfly', 7: 'monkey', 8: 'fish', 9: 'fruit', } class Dataset(tdata.Dataset): def __init__(self, pkl_name): self.pkl_name = pkl_name with open(self.pkl_name, 'rb') as f: loaded_data = pickle.load(f) self.labels = loaded_data['labels'] self.data = loaded_data['data'] def __getitem__(self, i): return { 'labels': self.labels[i].astype( 'i8' ), # torch wants labels to be of type LongTensor, in order to compute losses 'data': self.data[i].astype('f4').transpose((2, 0, 1)) / 255, # First retype to float32 (default dtype for torch) # then permute axes (torch expects data in CHW order) and divide it by 255 to scale it into range 0-1 } def __len__(self): return self.labels.shape[0] def get_prediction_order(prediction, label): # prediction has shape [B, 10] (where B is batch size, 10 is number of classes) # label has shape [B] # both are torch tensors, prediction represents either score or probability of each class. # probability is torch.softmax(score, dim=1) # either way, the higher the value for each class, the more probable it is according to your model # therefore we can sort it according to given probability - and check on which place is the correct label. # ideally you want it to be at first place, but for example ImageNet is also evaluated on top-5 error # take 5 most confident predictions and only if your label is not in those best predictions, count it as error # Since ImageNet dataset has 1000 classes, if your predictions were random, top-5 error should be around 99.5 % prediction = prediction.detach() # detach from computational graph (no grad) label = label.detach() prediction_sorted = torch.argsort(prediction, 1, True) finder = ( label[:, None] == prediction_sorted ) # None as an index creates new dimension of size 1, so that broadcasting works as expected order = torch.nonzero(finder)[:, 1] # returns a tensor of indices, where finder is True. return order def create_confusion_matrix(num_classes, prediction, label): prediction = prediction.detach() label = label.detach() prediction = torch.argmax(prediction, 1) cm = torch.zeros( (num_classes, num_classes), dtype=torch.long, device=label.device ) # empty confusion matrix indices = torch.stack((label, prediction)) # stack labels and predictions new_indices, counts = torch.unique( indices, return_counts=True, dim=1 ) # Find, how many cases are for each combination of (pred, label) cm[new_indices[0], new_indices[1]] += counts return cm def print_stats(conf_matrix, orders): num_classes = conf_matrix.shape[0] print('Confusion matrix:') print(conf_matrix) print('\n---\n') print('Precision and recalls:') for c in range(num_classes): precision = conf_matrix[c, c] / conf_matrix[:, c].sum() recall = conf_matrix[c, c] / conf_matrix[c].sum() f1 = (2 * precision * recall) / (precision + recall) print( 'Class {cls:10s} ({c}):\tPrecision: {prec:0.5f}\tRecall: {rec:0.5f}\tF1: {f1:0.5f}'.format( cls=CLASSES[c], c=c, prec=precision, rec=recall, f1=f1 ) ) print('\n---\n') print('Top-n accuracy and error:') order_len = len(orders) for n in range(num_classes): topn = (orders <= n).sum() acc = topn / order_len err = 1 - acc print( 'Top-{n}:\tAccuracy: {acc:0.5f}\tError: {err:0.5f}'.format(n=(n + 1), acc=acc, err=err) ) def evaluate(num_classes, dataset_file, batch_size=32, model=None): if model is None: model = hw_1.load_model() # load model, your hw if torch.cuda.is_available(): device = torch.device('cuda') else: device = torch.device('cpu') model = model.to(device) model = model.eval() # switch to eval mode, so that some special layers behave nicely dataset = Dataset(dataset_file) loader = tdata.DataLoader(dataset, batch_size=batch_size) confusion_matrix = torch.zeros( (num_classes, num_classes), dtype=torch.long, device=device ) # empty confusion matrix orders = [] with torch.no_grad(): # disable gradient computation for i, batch in enumerate(loader): data = batch['data'].to(device) labels = batch['labels'].to(device) prediction = model(data) confusion_matrix += create_confusion_matrix(num_classes, prediction, labels) order = get_prediction_order(prediction, labels).cpu().numpy() orders.append(order) print('Processed {i:02d}th batch'.format(i=(i + 1))) print('\n---\n') orders = np.concatenate(orders, 0) confusion_matrix = confusion_matrix.cpu().numpy() print_stats(confusion_matrix, orders) return (orders == 0).mean() # Return top-1 accuracy if __name__ == '__main__': parser = argparse.ArgumentParser('Evaluation demo for HW01') parser.add_argument('dataset', type=str) parser.add_argument('--batch_size', '-bs', default=32, type=int) parser.add_argument('--num_classes', '-nc', default=10, type=int) args = parser.parse_args() evaluate(args.num_classes, args.dataset, args.batch_size)