Quick links: 
[[https://intranet.fel.cvut.cz/cz/education/rozvrhy-ng.B222/public/html/predmety/61/70/p6170206.html | Schedule]] | 
[[https://cw.felk.cvut.cz/forum/forum-1830.html|Forum]] | 
[[https://cw.felk.cvut.cz/brute/teacher/course/1461| BRUTE]] | 
[[https://cw.fel.cvut.cz/b222/courses/bev033dle/lectures | Lectures]] |
[[https://cw.fel.cvut.cz/b222/courses/bev033dle/labs | Labs]]


====== Lab 3: Finetuning (Transfer Learning) ======
 
The main task is to fine-tune a pretrained CNN for a new classification task (transfer learning).

Skills: data loader from an image folder, data preprocessing, loading pretrained models, remote GPU servers, training part of the model.
Insights: convolutional filters, error case analysis

==== Introduction ====

In this lab we start from a model already pretrained on the ImageNet classification dataset (1000 categories and 1.2 million images) and try to adjust it for solving a small-scale but otherwise challenging classification problem.
  * This will allow to work with a large-scale model at moderate computational expenses, since our fine-tuning dataset is small.
  * We will see that a pretrained network has already learned powerful visual features, which will greatly simplify our task.
  * We will consider two fine-tuning variants: adjusting the last layer or all layers.


==== Setup ====
== GPU Servers ==
It is a good time now to start working with GPU servers.
Check [[courses:bev033dle:labs:0_howto:start| How To]] page. The recommended setup is as follows:
  - SSH authentication with pre-shared keys
  - VS Code "Remote - SSH" extension
  - Lmod configuration loaded via a the "Python Wrapper" method

**Beware:** VScode tends to keep the connection active even after you turn off your computer. As the GPU memory is expensive, login to the server regularly and check if your processes still occupy some GPUs. You may call ''pkill -f ipykernel'' to kill these processes.

=== Model ===

SOTA pretrained architectures are available in PyTorch. We will use the following models:
  - VGG11 [[https://pytorch.org/hub/pytorch_vision_vgg/]], which was the model considered in the CNN lecture.
  - Squeezenet1_0 [[https://pytorch.org/hub/pytorch_vision_squeezenet/]], which has much fewer parameters and uses ‘fire’ modules similar to the example in CNN lecture slide 18. It will be about 4 times faster to train but achieves somewhat lower accuracy on Imagenet.

<code python>
import torchvision.models
model1 = torchvision.models.vgg11(weights=torchvision.models.VGG11_Weights.DEFAULT)
model2 = torchvision.models.squeezenet1_0(weights=torchvision.models.SqueezeNet1_0_Weights.DEFAULT)
</code>
/*
You might get the 'CERTIFICATE_VERIFY_FAILED' error, meaning that it cannot connect on a secure connection to download the model. In this case use the non-secure workaround:

<code python>
import torchvision.models
from torchvision.models.vgg import model_urls
# from torchvision.models.squeezenet import model_urls

for k in model_urls.keys():
    model_urls[k] = model_urls[k].replace('https://', 'http://')

model = torchvision.models.vgg11(pretrained=True)
# model = torchvision.models.squeezenet1_0(pretrained=True)
</code>
*/
You can see the structure of the loaded model by calling ''%%print(model)%%''. You can also open the source defining the network  architecture [[https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py ]]. Usually it is defined as a hierarchy of Modules, where each Module is either an elementary layer (e.g. Conv2d, Linear, ReLU) or a container (e.g. Sequential).


=== Data ===
The data will be placed in ''/local/temporary/butterflies/'' on both servers (for a faster access and to avoid multiple copies). 
You can also download the dataset (e.g. to use on your computer):
  - {{ :courses:bev033dle:butterflies.zip | Butterflies}} (35Mb)

The dataset contain color images 224x224 pixels of 10 categories.
The scientific (Latin) names of the butterfly categories are:
  01: Danaus plexippus	
  02: Heliconius charitonius	
  03: Heliconius erato	
  04: Junonia coenia	
  05: Lycaena phlaeas
  06: Nymphalis antiopa	
  07: Papilio cresphontes	
  08: Pieris rapae	
  09: Vanessa atalanta	
  10: Vanessa cardui

=== Template ===

{{ :courses:bev033dle:labs:lab2_finetune:template.zip}}

This lab is substantially renewed this year, please let us know of any problems you encounter with the template or the task.

==== Part 1: Visualization of First Layer Filters and Features (1p) ====
The first task will be just to load the pretrained network, apply it to test image and visualize the convolution filters and activations in the first layer. For this task, squeezenet is more suitable as it has 7x7 convolution filters in the first layer. There are a couple of technicalities, prepared in the template.
  * Load the test image, transform it to the expected input of the network (type, shape, scaling)
  * Use the network to compute class predictive probabilities and report the top 5 classes and their probabilities.
  * Display weights of the first convolutional layer as images, in a grid of 8 x 12 (SqueezeNet has 96 channels in the first layer)
  * Apply the First linear layer of the network to the input image and display the resulting activation maps for the first 16 channels (e.g. as a grid of 4 x 4 images). Observe the result before and after non-linearity. Hint: the ''Sequential'' container supports slicing, so that ''model.features[0:2]'' is a small neural network consisting of the first few layers.


==== Part 2: Data Preprocessing and Loaders (2p) ====
To address the classification task, we first need to load in the data: create dataset, split into training and validation, create loaders. Fortunately, there are convenient tools for all the steps. The respective technicalities are prepared in the template.
  * Create dataset for all training images. We use existing tool ''datasets.ImageFolder'': <code python>
from torchvision import datasets, transforms
train_data = datasets.ImageFolder('/local/temporary/butterflies/train', transform.ToTensor())
train_loader = torch.utils.data.DataLoader(train_data, batch_size=1, shuffle=True, num_workers=0)
</code>
  * Let us verify the statistics of the data are matching to those used to standardize the inputs for ImageNet. On the whole training set compute mean and standard deviation per color channel over all pixels and all images in the training set. Think how to do it incrementally with mini-batches, not loading the whole dataset into memory at once. You should obtain values similar to those of ImageNet:<code>
mean=[0.485, 0.456, 0.406] 
std=[0.229, 0.224, 0.225]
</code>
  * Recreate the dataset using the transform 'transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=mean, std=std)])' with your computed standardization.
  * From the train dataset create two loaders: the loader used for optimizing hyperparameters (''train_loader'') and the loader used for validation (''val_loader''). Use the ''sampler'' argument of ''DataLoader'' with [[https://pytorch.org/docs/stable/data.html?highlight=subsetrandomsampler#torch.utils.data.SubsetRandomSampler|SubsetRandomSampler]]. Use random subsets instead of just slicing the dataset: you should not assume that the dataset is randomly shuffled (and in this task it is really not).

==== Part 3: Finetuninig (4p) ====
We will first try to learn only the last layer of the network on the new data. I.e. we will use the network as a feature extractor and learn a linear classifier on top of it, as if it was a logistic regression model on some features. This task is somewhat simpler with VGG architecture (SqueezeNet uses fully convolutional architecture and global pooling in the end).

  - Load the **vgg11** model
  - Move the model to GPU ''model.to(dev)''
  - Freeze all parameters of the model, so that they will not be trained, by<code python>
for param in model.parameters():
    param.requires_grad = False
</code>
  - Setting ''model.train(False)'' will fix the behaviour of batchnorm and dropout layers (if present) to deterministic input-independent
  - In you model architecture identify and delete the "classifier" part that maps "features" to scores of 1000 ImageNet classes.
  - Add a new "classifier" module that consists of one or more linear layers, with randomly initialized weights and outputs scores for 10 classes (our datasets). If we construct Linear  layers anew, their parameters are automatically randomly initialized and have the attribute  ''requires_grad = True'' by default, i.e. will be trainable.
  - Train the network for 10+ epochs. Use higher-level tools: ''optimizer'' and ''nll_loss'' as proposed in the template. When loading the data, move the data to GPU as well, note ''to(dev)'' is not an in-place operation for Tensors, unlike for Modules.
  - Choose the best learning rate and the stopping epoch by cross-validation. Select the learning rate from $0.01, 0.03, 0.001, 0.003, 0.0001$. In order to apply cross-validation, use the ''val_loader'' to evaluate the validation accuracy in the end of each training epoch. Select the model that achieves the best validation accuracy over all of the learning rates and training epochs. Save the best network using ''torch.save''. See [[ https://pytorch.org/tutorials/beginner/saving_loading_models.html | Saving / Loading Tutorial ]].
  - Report the full setup of learning that you used: base network, classifier architecture, optimizer, learning rate, grid for hyper-parameters search and the selected hyper-parameters. Report logs (or plots) of training and validation metrics (loss, accuracy) versus epochs for the selected hyper-parameters (learning rate). 
  - Repeat the learning experiment, however this time allow all parameters of the neural network to be updated (no freezing). Take care that the train-validation split would be the same. Report the same metrics for this approach.

Report the final test classification accuracy of the best model (selected on the validation set). The test set is specified as a separate folder: <code>
test_data = datasets.ImageFolder('/local/temporary/butterflies/test', transform)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=8, shuffle=False, num_workers=0)
</code>
Use the same input transform as for training.  
Do not re-tune the hyperparameters to achieve a better test set perforomance!
The network will probably make a few errors on the test set. For these cases display and report: 1) the input test image, 2) its correct class label, 3) the class labels and network confidence (predictive probabilities) of the top 3 network predictions (classes with highest predictive probability).

==== Part 4: Data Augmentation (3pt) ====

Because we have very limited training / testing data available, it is a good idea to use also data augmentation. Let us select some transforms, which can be expected to result in realistic images of the same class. A possible set is
  * RandomHorizontalFlip
  * RandomAffine
  * RandomAdjustSharpness
See [[ https://pytorch.org/vision/main/auto_examples/plot_transforms.html#sphx-glr-auto-examples-plot-transforms-py | Torchvision transform examples]].
Note that transforms inherit ''torch.nn.Module'' and therefore can be used the same way as layers, or as functions applied to data Tensors (however, not batched). They can be also built-in the Dataset by setting the transform argument. They can process PIL.Image or a Tensor. For efficiently reasons it is better to use them as functions on Tensors.
  - Create a Composite transform with a small random effect strength (e.g. rotation up to 10 degrees, etc) of each kind from our list or your list. 
  - Apply this transform at all stages: training, validation, testing. If incorporated in the Dataset, be sure to keep the input standardization transform as used previously. Adjust the validation / testing procedure to account for the extra randomness, e.g. average results over several draws of the random transform for each data point.
  - Evaluate the test performance of the best previously trained saved model. 
  - Train the linear classifier with frozen features again, this time with data augmentation. A somewhat longer training duration would be appropriate, e.g. 30+ epochs. Apply the same cross-validation and testing protocol as before.