Lab 4: Finetuning

Deep Learning (SS2020) computer lab (10p)

Introduction

In this lab we start from a model already pretrained on the ImageNet classification dataset (1000 categories and 1.2 million images) and try to adjust it for solving a small-scale but otherwise challenging classification problem.

This will allow to work with a large-scale model at moderate computational expenses, since our fine-tuning dataset is small.
We will see that pretrained networks have already learned powerful visual features, which will greatly simplify our task.
We will consider several fine-tuning variants, adjusting a part of the network or all layers.

Pytorch has a tutorial closely related to this assignment: Transfer Learning For Computer Vision Tutorial

Setup

Model

Fortunately, many excellent pretrained architectures are available in pytorch. You will use one of the following models:

VGG11 https://pytorch.org/hub/pytorch_vision_vgg/, which was the model considered in the CNN lecture.
Squeezenet1_0 https://pytorch.org/hub/pytorch_vision_squeezenet/, which has much fewer parameters and uses ‘fire’ modules similar to the example in CNN lecture slide 18. It will be about 4 times faster to train but achieves somewhat lower accuracy on Imagenet.

import torch
model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg11', pretrained=True)

You might get the 'CERTIFICATE_VERIFY_FAILED' error, meaning that it cannot connect on a secure connection to download the model. In this case use the non-secure workaround:

import torchvision.models
from torchvision.models.vgg import model_urls
# from torchvision.models.squeezenet import model_urls
 
for k in model_urls.keys():
    model_urls[k] = model_urls[k].replace('https://', 'http://')
 
model = torchvision.models.vgg11(pretrained=True)
# model = torchvision.models.squeezenet1_0(pretrained=True)

You can see the structure of the loaded model by calling print(model). You can also open the source defining the network architecture https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py. Usually it is defined as a hierarchy of Modules, where each Module is either an elementary layer (e.g. Conv2d, Linear, ReLU) or a container (e.g. Sequential).

Data

Download one of the datasets we offer for this task:

Butterflies (35Mb)
iNaturalist mix1 (108Mb)

All of the datasets contain color images 224×224 pixels of 10 categories.

Assignment 1 (2p)

Here we will practice some data loading and preprocessing techniques. – Create dataset and a loader for training images. We can use this existing dataset interface that loads images from the disk:

from torchvision import datasets, transforms
train_data = datasets.ImageFolder('../data/butterflies/train', transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(train_data, batch_size=4, shuffle=True, num_workers=0)

– Perform standartization of the data: on the training set compute mean and standard deviation per color channel over all pixels and all images in the training set. We will put these constant values in the code as a preprocessing, not to recompute them over again. – Add transforms.Normalize with the statistics you found to your dataset constructor. This is in order to standardize (whiten) the input for better conditioned training and also for better matching to what the pretrained model expects. Apply this transform as well on the test dataset.

– From the train dataset create two loaders: the loader used for optimizing hyperparameters (train_loader) and the loader used for validation (val_loader). This is similar to Lab3, using SubsetRandomSampler.

Assignment 2 (4p)

We will first try learning the last layer of the network on the new data. We will use the network as a feature extractor and learn a linear classifier on top of it, as if it was a logistic regression model on some features. We need to do the following:

Load the vgg11 model
Freeze all parameters of the model, so that they will not be trained, by

for param in model.parameters():
    param.requires_grad = False

In you model architecture identify the last linear layer that maps the features to the output scores.
Replace the last linear layer by a fresh Linear layer with random weights and with size matching the number of classes we finetune for. If we construct such a layer anew, its parameters will have the property 'requires_grad = True' by default.
Train the network in this configuration. Adjust learning rates etc. by observing the training and validation performance. Verify validation accuracy for each epoch and keep track of the parameter vector that achieves the best validation accuracy (saving it if it is better than the best so far).
Report the setup of learning parameters that you used, the plot of training metrics (loss, accuracy) and the final test classification accuracy.
For several selected error cases on the test data (not more than 10) display and report the test image and the 3 images with the highest score, to see what the network confuses these test images for.

Assignment 3 (4p)

Depending on the size of the dataset and how much it is different from what is represented in Imagenet, one of the following options may give better results. With larger datasets, we expect them to improve over training only the last layer.

Finetune the whole model. For this load the pretrained model, do not freeze any parameters and reinitialize the output layer to random of a proper size, as in the previous assignment. A smaller learning rate is recommended for fine-tuning.
This time take the model you saved in Assignment 2 with the last layer learned for our task. Fine-tune the whole model on the training + validation data (e.g. the whole initial training data set) with a small learning rate (you need to restore requires_grad = True on all parameters).
Report parameters chosen, training loss and training and test accuracies achieved. Which of the finetuning approaches obtained the best result?

Important Technicalities

Some models have batch normalization layers. The training and test modes for such models differ and it is important to control the state by setting model.eval(False) during training and model.eval(True) during validation or testing.
Do not forget to run your training on a GPU by moving the model and the input data there.
Please post on the forum if you find out some more important guidelines

Table of Contents