====== HW 02 - Image recognition ====== Your second homework will be Image recognition. For this task, we have created our own dataset, which is based on [[http://image-net.org/|ImageNet]]. The homework will be introduced in the labs in the 5th week, we will try to clear any doubts in Video. ==== Dataset ==== The dataset consists of 10 classes, 500 training images for each class, and 50 testing and 50 validation images for each class. Each image has resolution of 128x128 and has three color channels (R, G, B) The dataset is available at ''taylor'' and ''cantor'' [[https://cyber.felk.cvut.cz/cs/study/gpu-servers/|servers]] in directory ''/local/temporary/vir/hw02''. It can also be downloaded in two formats. Either as a pickle [[http://cmp.felk.cvut.cz/~jasekota/vir/hw01/pkl_data.tgz|file]] in which is a standard Python dict with three keys: * ''data'' - NumPy array containing image data as array of size ''[Nx128x128x3]'' and ''np.uint8'' dtype, ranging from 0 to 255 where ''N'' is a number of files. * ''labels'' - NumPy array containing corresponding labels as array of size ''[N]'' and ''np.uint8'' dtype, ranging from 0 to 9. * ''filenames'' - Python list (of len ''N'') containing filenames of the corresponding images. Second format is as raw [[http://cmp.felk.cvut.cz/~jasekota/vir/hw01/image_data.tgz|images]] {{ :courses:b3b33vir:tutorials:canvas.png|}} Classes mapping is following: * 0: bird * 1: lizard * 2: snake * 3: spider * 4: dog * 5: cat * 6: butterfly * 7: monkey * 8: fish * 9: fruit ==== Your task ==== Design and train a neural network, achieving high accuracy on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset). The GPU used in the evaluation is actually GPU with ID 7 on the ''taylor'' student server. Because of this, it is **strictly** forbidden to use this GPU during training, as it will negatively impact running time of evaluation for everyone! ==== Submission ==== Submit a Python module/package, that is importable by name ''hw_2'' and has a function ''load_model()''. Function ''load_model()'' needs to return an instance of ''torch.nn.Module'' (or a subclass) which is * capable of living either on cpu or GPU. Evaluation will be run on a GPU server. * capable of accepting a tensor of size ''[Bx3x128x128]'' (''B'' is a batch size), dtype ''torch.float32'', ranging from 0 to 1 * returning a tensor of size ''[Bx10]'' with dtype ''torch.float32'', living on the same device as input data This is the only portion of your code, that will be automatically checked. However, in addition, submit also **all** other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it. When using ''torch.load'', always use ''map_location="cpu"''. It is the safest option, for the case, when your model will not be able to live on GPU. # Get the weights and biases model.state_dict() # store it on harddisk torch.save(model.state_dict(), 'weights.pts') # load weights from harddisk and return to model model = My_Net() model.load_state_dict(torch.load('weights.pts', map_location="cpu")) # ^ should print "" Simplest submitted code (that won't achieve any points) can be along these lines. Name it **hw_2.py**, it should be submitted together with the model's weights stored through //model.state_dict()// in file named **weights.pth**. import torch import os class Model(torch.nn.Module): '''This is my super cool, but super dumb module''' def __init__(self): super().__init__() def forward(self, x): batch_size = x.shape[0] return torch.rand(batch_size, 10, device=x.device) def load_model(): # This is the function to be filled. Your returned model needs to be an instance of subclass of torch.nn.Module # Model needs to be accepting tensors of shape [B, 3, 128, 128], where B is batch_size, which are in a range of [0-1] and type float32 # It should be possible to pass in cuda tensors (in that case, model.cuda() will be called first). # The model will return scores (or probabilities) for each of the 10 classes, i.e a tensor of shape [B, 10] # The resulting tensor should have same device and dtype as incoming tensor directory = os.path.abspath(os.path.dirname(__file__)) # The model should be trained in advance and in this function, you should instantiate model and load the weights into it: model = Model() model.load_state_dict(torch.load(directory + '/weights.pth', map_location='cpu')) # For more info on storing and loading weights, see https://pytorch.org/tutorials/beginner/saving_loading_models.html return model ---- In order to participate in the tournament part, your model also has to have **non-empty docstring**, briefly summarizing your model. We do not want you to spill your secrets, however, the main idea should be evident from the docstring. The only thing checked about docstring is whether it's there and nonempty. It will be visible to other students. **Please, do not use diacritics in the docstring**. ==== Points ==== The dataset is rather difficult, therefore in order to get any points, you only need top-1 accuracy of 40 %. In order to get a full amount of points for the individual part of the assignment, you need top-1 accuracy of 60 %. Anything in between will be linearly spaced. The maximum amount of points from the individual part of the assignment is 12. The equation is: $$pts_{individual} = 12\times\text{clip}(\frac{acc - 0.4}{0.6 - 0.4}, 0, 1)$$ ==== Deadline ==== * Individual part: D + 21 days, 23:59 CEST, where D is a day of your tutorial where the homework is introduced (in 5th week) * Tournament part: 11.11.2021, 23:59 CET Every 24 hours after the deadline, you will lose 1 points. However, you will not gain a negative number of points, so the minimum is 0. Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments. We recommend allowing at least a full day for work on this homework. Because there are two separate deadlines for this task, there are also two homeworks in BRUTE. You need to submit your work to **both** of them. The evaluation script is the same in both of them. However, in the tournament part, the BRUTE will always report 0 points for Automatic Evaluation and you will only gain points in the tournament. Do **not** be alarmed of this behavior, it is expected. ==== Additional training data ==== For the individual part, you **may not** use additional training data. For the tournament part, the use of additional data is allowed, however, you must follow couple rules: * The training of the network will be performed **only by you and from scratch**. That means, you **may not** use already pre-trained networks * The usage of additional training data will be mentioned in the docstring, so that it is clearly visible to everyone. (It also has to be within first 70 characters of the docstring, since we truncate longer docstrings) * The source of the additional data has to be specified in the comments in your code. This will be read only by the teachers. We want to know, what data you used to obtain such a great score, however, it is not necessary to put it in the docstring for everyone else to see. ==== Helpful resources ==== * [[https://pytorch.org/tutorials/beginner/saving_loading_models.html|Saving and loading Modules]] * [[https://pytorch.org/tutorials/beginner/data_loading_tutorial.html|Loading data using Datasets and DataLoaders]] ==== Code template ==== Code along these lines is used for evaluation in BRUTE. Feel free to use it. #!/usr/bin/env python3 import argparse import pickle import numpy as np import torch import torch.utils.data as tdata import hw_2 CLASSES = { 0: 'bird', 1: 'lizard', 2: 'snake', 3: 'spider', 4: 'dog', 5: 'cat', 6: 'butterfly', 7: 'monkey', 8: 'fish', 9: 'fruit', } class Dataset(tdata.Dataset): def __init__(self, pkl_name): self.pkl_name = pkl_name with open(self.pkl_name, 'rb') as f: loaded_data = pickle.load(f) self.labels = loaded_data['labels'] self.data = loaded_data['data'] def __getitem__(self, i): return { 'labels': self.labels[i].astype( 'i8' ), # torch wants labels to be of type LongTensor, in order to compute losses 'data': self.data[i].astype('f4').transpose((2, 0, 1)), # First retype to float32 (default dtype for torch) # then permute axes (torch expects data in CHW order) # Scale input data in your model's forward pass!!! } def __len__(self): return self.labels.shape[0] def get_prediction_order(prediction, label): # prediction has shape [B, 10] (where B is batch size, 10 is number of classes) # label has shape [B] # both are torch tensors, prediction represents either score or probability of each class. # probability is torch.softmax(score, dim=1) # either way, the higher the value for each class, the more probable it is according to your model # therefore we can sort it according to given probability - and check on which place is the correct label. # ideally you want it to be at first place, but for example ImageNet is also evaluated on top-5 error # take 5 most confident predictions and only if your label is not in those best predictions, count it as error # Since ImageNet dataset has 1000 classes, if your predictions were random, top-5 error should be around 99.5 % prediction = prediction.detach() # detach from computational graph (no grad) label = label.detach() prediction_sorted = torch.argsort(prediction, 1, True) finder = ( label[:, None] == prediction_sorted ) # None as an index creates new dimension of size 1, so that broadcasting works as expected order = torch.nonzero(finder)[:, 1] # returns a tensor of indices, where finder is True. return order def create_confusion_matrix(num_classes, prediction, label): prediction = prediction.detach() label = label.detach() prediction = torch.argmax(prediction, 1) cm = torch.zeros( (num_classes, num_classes), dtype=torch.long, device=label.device ) # empty confusion matrix indices = torch.stack((label, prediction)) # stack labels and predictions new_indices, counts = torch.unique( indices, return_counts=True, dim=1 ) # Find, how many cases are for each combination of (pred, label) cm[new_indices[0], new_indices[1]] += counts return cm def print_stats(conf_matrix, orders): num_classes = conf_matrix.shape[0] print('Confusion matrix:') print(conf_matrix) print('\n---\n') print('Precision and recalls:') for c in range(num_classes): precision = conf_matrix[c, c] / conf_matrix[:, c].sum() recall = conf_matrix[c, c] / conf_matrix[c].sum() f1 = (2 * precision * recall) / (precision + recall) print( 'Class {cls:10s} ({c}):\tPrecision: {prec:0.5f}\tRecall: {rec:0.5f}\tF1: {f1:0.5f}'.format( cls=CLASSES[c], c=c, prec=precision, rec=recall, f1=f1 ) ) print('\n---\n') print('Top-n accuracy and error:') order_len = len(orders) for n in range(num_classes): topn = (orders <= n).sum() acc = topn / order_len err = 1 - acc print( 'Top-{n}:\tAccuracy: {acc:0.5f}\tError: {err:0.5f}'.format(n=(n + 1), acc=acc, err=err) ) def evaluate(num_classes, dataset_file, batch_size=32, model=None): if model is None: model = hw_1.load_model() # load model, your hw if torch.cuda.is_available(): device = torch.device('cuda') else: device = torch.device('cpu') model = model.to(device) model = model.eval() # switch to eval mode, so that some special layers behave nicely dataset = Dataset(dataset_file) loader = tdata.DataLoader(dataset, batch_size=batch_size) confusion_matrix = torch.zeros( (num_classes, num_classes), dtype=torch.long, device=device ) # empty confusion matrix orders = [] with torch.no_grad(): # disable gradient computation for i, batch in enumerate(loader): data = batch['data'].to(device) labels = batch['labels'].to(device) prediction = model(data) confusion_matrix += create_confusion_matrix(num_classes, prediction, labels) order = get_prediction_order(prediction, labels).cpu().numpy() orders.append(order) print('Processed {i:02d}th batch'.format(i=(i + 1))) print('\n---\n') orders = np.concatenate(orders, 0) confusion_matrix = confusion_matrix.cpu().numpy() print_stats(confusion_matrix, orders) return (orders == 0).mean() # Return top-1 accuracy if __name__ == '__main__': parser = argparse.ArgumentParser('Evaluation demo for HW01') parser.add_argument('dataset', type=str) parser.add_argument('--batch_size', '-bs', default=32, type=int) parser.add_argument('--num_classes', '-nc', default=10, type=int) args = parser.parse_args() evaluate(args.num_classes, args.dataset, args.batch_size)