Table of Contents

Lab 3: Finetuning

Fine-tuning a pretrained CNN for a new task.

Skills: creating dataset from an image folder, data preprocessing, loading pretrained models, working remotely with GPU server, training part of the model, hyper-parameter search.

Introduction

In this lab we start from a model already pretrained on the ImageNet classification dataset (1000 categories and 1.2 million images) and try to adjust it for solving a small-scale but otherwise challenging classification problem.

Pytorch has a tutorial closely related to this assignment: Transfer Learning For Computer Vision Tutorial

Setup

Model

Fortunately, many excellent pretrained architectures are available in pytorch. You can use one of the following models:

  1. VGG11 https://pytorch.org/hub/pytorch_vision_vgg/, which was the model considered in the CNN lecture.
  2. Squeezenet1_0 https://pytorch.org/hub/pytorch_vision_squeezenet/, which has much fewer parameters and uses ‘fire’ modules similar to the example in CNN lecture slide 18. It will be about 4 times faster to train but achieves somewhat lower accuracy on Imagenet.

import torch
model = torch.hub.load('pytorch/vision:v0.9.0', 'vgg11', pretrained=True)

You might get the 'CERTIFICATE_VERIFY_FAILED' error, meaning that it cannot connect on a secure connection to download the model. In this case use the non-secure workaround:

import torchvision.models
from torchvision.models.vgg import model_urls
# from torchvision.models.squeezenet import model_urls
 
for k in model_urls.keys():
    model_urls[k] = model_urls[k].replace('https://', 'http://')
 
model = torchvision.models.vgg11(pretrained=True)
# model = torchvision.models.squeezenet1_0(pretrained=True)

You can see the structure of the loaded model by calling print(model). You can also open the source defining the network architecture https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py. Usually it is defined as a hierarchy of Modules, where each Module is either an elementary layer (e.g. Conv2d, Linear, ReLU) or a container (e.g. Sequential).

Data

Download one of the datasets we offer for this task:

  1. Butterflies (35Mb)

All of the datasets contain color images 224×224 pixels of 10 categories.

Part 1 (2p)

Here we will practice data loading and preprocessing techniques.

Part 2 (4p)

We will first try learning the last layer of the network on the new data. I.e. we will use the network as a feature extractor and learn a linear classifier on top of it, as if it was a logistic regression model on some features. We need to do the following:

  1. Load the vgg11 model
  2. Freeze all parameters of the model, so that they will not be trained, by

for param in model.parameters():
    param.requires_grad = False

  1. In you model architecture identify and delete the “classifier” part that maps “features” to scores of 1000 ImageNet classes.
  2. Add a new “classifier” module that consists of one or more linear layers, with randomly initialized weights and outputs scores for 10 classes (our datasets). If we construct Linear layers anew, their parameters are automatically randomly initialized and have the attribute requires_grad = True by default, i.e. will be trainable. Consider using torch.nn.BatchNorm1d (after linear layers) or torch.nn.Dropout (after activations) inside your classifier block.
  3. Train the network and choose best parameters by cross-validation. Find a suitable learning rate as follows. First roughly determine the learning rate order by trying learning rates $1, 0.1, 0.01, 0.001, 0.0001$ and measuring training loss. Select a grid of 5 learning rate values with which to perform cross-validation. Evaluate validation accuracy after each epoch (as in lab2) and keep track of the parameter vector that achieves the best validation accuracy (saving it if it is better than the best so far). This way we automatically select the epoch at which it was the best to stop. Choose the learning rate (and dropout rate if applies) that achieves the best validation error.
  4. Report the full setup of learning that you used: base network, classifier architecture, optimizer, learning rate and other hyper-parameters. Report plots of training and validation metrics (loss, accuracy) versus epochs for the selected hyper-parameters. Report the final test classification accuracy.
  5. If the network makes errors on the test data (we expect a few). For these cases display and report: 1) the input test image, 2) its correct class label, 3) the class labels and network confidence (predictive probabilities) of the top 3 network predictions (classes with highest predictive probability).

Part 3 (4p)

Depending on the size of the dataset and how much it is different from Imagenet, one of the following options may give better results compared to training last layer only.

  1. Finetune the whole model. For this load the pretrained model, do not freeze any parameters and replace the output layer by a new one (of the appropriate size and randomly initialized). A smaller learning rate is recommended for fine-tuning.
  2. This time take the model you saved in Part 2 with the last layer already learned for our task. Fine-tune the whole model on the training + validation data (e.g. the whole initial training data set) with a small learning rate (you need to restore requires_grad = True on all parameters).
  3. Report parameters chosen, training loss and training and test accuracies achieved. Which of the finetuning approaches obtained the best result?

Important Technicalities