Search
In your third homework you will train image classifier from scratch, implement training and validation loop, dataset, data loader and couple of helper function. Points will be achieved based on an accuracy on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset).
The homework will be introduced in the labs in the 6th week.
We will be using a subset of popular and well-known ImageNet dataset available at taylor and cantor servers in directory /local/temporary/vir/hw03.
taylor
cantor
/local/temporary/vir/hw03
Training will be perform on one of two GPU servers available for the courses of the Depatment of Cybernetics.
The dataset consists of 50 classes, 1000 training, 50 testing and 50 validation images for each class. Each image has different resolution and has three color channels (R, G, B). Dataset has following structure:
. ├─── train │ ├ n01751748 (directory of first class contains 1000 jpeg images) │ ├ ... │ └ n03018349 └─── val ├ n01751748 (directory of first class contains 50 jpeg images) ├ ... └ n03018349
Dataset contains following classes:
Firstly, you will need to create dataset (It is an object, which contains training samples or can read/generate them on the fly and all necessary information). There are several classes for the most common use-cases in torchvision module or you can write you own dataset from scratch (ideally class with inheritance from torch.utils.data.Dataset). We will use ImageFolder class from torchvision. It assumes that data is already structured like our dataset 'datasetroot/class1/images“.
The dataset returns PIL.Image, thus it is important to add tfms.ToTensor() into dataset, which transforms PIL.Image (values range from 0 to 255) to tensor (type PyTorch FloatTensor of shape (C, H, W) with a range [0.0, 1.0]). The images are not captured by the same device and therefore we must add also tfms.Resize(), which resize our image to shape (256, 256).
tfms.ToTensor()
tfms.Resize()
We will also add tsfm.Normalize(), which helps get data within a range and reduces the skewness which helps learn faster and better. Normalization can also tackle the diminishing and exploding gradients problems. The input to normalization is mean and standard deviation, which needs to be calculated (pixel-wise) on training dataset. We provide calculated values, which will be also used in the evaluation.
tsfm.Normalize()
import torchvision as tv import torchvision.transforms as tfms mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] train_transform = tfms.Compose([tfms.Resize((256, 256)), tfms.ToTensor(), tfms.Normalize(mean, std)]) val_transform = tfms.Compose([tfms.Resize((256, 256)), tfms.ToTensor(), tfms.Normalize(mean, std)]) train_dataset = tv.datasets.ImageFolder(f'/local/temporary/vir/hw03/train', transform=train_transform) val_dataset = tv.datasets.ImageFolder(f'/local/temporary/vir/hw03/val', transform=val_transform)
Secondly, we need to create dataloader (Data loader is an object, which takes samples from the dataset and generates the batch in efficient way). Class takes as input values batch_size (number of samples, which will be propagated through the model), shuffle (bool, to make random order) and num_workers (number of subprocess for loading the data, must be equal or below the num of CPU cores).
train_dataloader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4) val_dataloader = DataLoader(dataset=val_dataset, batch_size=1, shuffle=False)
For more information, visit link
The dataset and dataloader are already provided. Your main task is to write a model and train it.
class Model(torch.nn.Module): '''This is my super cool, but super dumb module''' def __init__(self): super().__init__() def forward(self, x): batch_size = x.shape[0] return torch.rand(batch_size, 50, device=x.device)
Core of the train and val loop is as following:
model.train() for input, labels in dataloader: output = model(input) loss = loss_fn(output, labels) loss.backward() optimizer.step() optimizer.zero_grad() validate(model, val_loader, loss_fn)
weight_init
model.eval
with torch.no_grad():
Useful lines of codes for searching device on server:
def get_accuracy(prediction, labels_batch, dim=1): pred_index = prediction.argmax(dim) return (pred_index == labels_batch).float().mean() def get_free_gpu(): os.system('nvidia-smi -q -d Memory |grep -A5 GPU|grep Free >tmp') memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()] index = np.argmax(memory_available[:-1]) # Skip the 7th card --- it is reserved for evaluation!!! return int(index) def get_device(): if torch.cuda.is_available(): gpu = get_free_gpu() device = torch.device(gpu) else: device = 'cpu' return device
Submit a Python module/package, that is importable by name hw_3 and has a function load_model(). Function load_model() needs to return an instance of torch.nn.Module (or a subclass) which is
hw_3
load_model()
torch.nn.Module
[Bx3x256x256]
B
torch.float32
torchvision.transforms.Normalize(mean, std)
mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
[Bx50]
This is the only portion of your code, that will be automatically checked. However, in addition, submit also all other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it.
torch.load
map_location=“cpu”
# Get the weights and biases model.state_dict() # store it on harddisk torch.save(model.state_dict(), 'weights.pts') # load weights from harddisk and return to model model = My_Net() model.load_state_dict(torch.load('weights.pts', map_location="cpu")) # ^ should print "<All keys matched successfully>"
Simplest submitted code (that won't achieve any points) can be along these lines. Name it hw_3.py, it should be submitted together with the model's weights stored through model.state_dict() in file named weights.pth.
import torch import os class Model(torch.nn.Module): '''This is my super cool, but super dumb module''' def __init__(self): super().__init__() def forward(self, x): batch_size = x.shape[0] return torch.rand(batch_size, 50, device=x.device) def load_model(): # This is the function to be filled. Your returned model needs to be an instance of subclass of torch.nn.Module # Model needs to be accepting tensors of shape [B, 3, 256, 256], where B is batch_size, type float32 # It should be possible to pass in cuda tensors (in that case, model.cuda() will be called first). # The model will return scores (or probabilities) for each of the 50 classes, i.e a tensor of shape [B, 50] # The resulting tensor should have same device and dtype as incoming tensor directory = os.path.abspath(os.path.dirname(__file__)) # The model should be trained in advance and in this function, you should instantiate model and load the weights into it: model = Model() model.load_state_dict(torch.load(directory + '/weights.pth', map_location='cpu')) # For more info on storing and loading weights, see https://pytorch.org/tutorials/beginner/saving_loading_models.html return model
The dataset is rather difficult, therefore in order to get any points, you only need top-3 accuracy of 50 %. In order to get a full amount of points for the individual part of the assignment, you need top-3 accuracy of 75 %. Anything in between will be linearly spaced. The maximum amount of points from the assignment is 13. The equation is:
$$pts_{individual} = 13\times\text{clip}(\frac{top_3 acc - 0.5}{0.75 - 0.5}, 0, 1)$$
Numpy clip function link.
Every 24 hours after the deadline, you will lose 1/3 of the point. However, you will not gain a negative number of points, so the minimum is 0.
Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments.