Quick links: Schedule | Forum | BRUTE | Lectures | Labs
The main task is to fine-tune a pretrained CNN for a new classification task (transfer learning).
Skills: data loader from an image folder, data preprocessing, loading pretrained models, remote GPU servers, training part of the model. Insights: convolutional filters, error case analysis
In this lab we start from a model already pretrained on the ImageNet classification dataset (1000 categories and 1.2 million images) and try to adjust it for solving a small-scale but otherwise challenging classification problem.
It is a good time now to start working with GPU servers. Check How To page. The recommended setup is as follows:
Beware: VScode tends to keep the server daemon active even after you turn off your computer. As the GPU memory is expensive, login to the server regularly and check if your processes still occupy some GPUs. You may call pkill -f ipykernel
to kill these processes.
SOTA pretrained architectures are available in PyTorch. We will use the following models:
import torchvision.models model1 = torchvision.models.squeezenet1_0(weights=torchvision.models.SqueezeNet1_0_Weights.DEFAULT) model2 = torchvision.models.resnet18(pretrained=True)You can see the structure of the loaded model by calling
print(model)
. You can also open the source defining the network architecture https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py. Usually it is defined as a hierarchy of Modules, where each Module is either an elementary layer (e.g. Conv2d, Linear, ReLU) or a container (e.g. Sequential).
The data will be placed in /local/temporary/Datasets/PACS_cartoon
and in /local/temporary/Datasets/PACS_cartoon_few_shot
on both servers (for a faster access and to avoid multiple copies).
You can also download the dataset (e.g. to use on your computer):
The PACS_cartoon dataset contain colored images of cartoons with 227×227 pixels and of 7 categories:
01: Dog 02: Elephant 03: Giraffe 04: Guitar 05: Horse 06: House 07: Person
This lab is substantially renewed this year, please let us know of any problems you encounter with the template or the task.
The first task will be just to load the pretrained network, apply it to test image and visualize the convolution filters and activations in the first layer. For this task, squeezenet is more suitable as it has 7×7 convolution filters in the first layer. There are a couple of technicalities, prepared in the template.
Sequential
container supports slicing, so that model.features[0:2]
is a small neural network consisting of the first few layers.
To address the classification task, we first need to load in the data: create dataset, split into training and validation, create loaders. Fortunately, there are convenient tools for all the steps. The respective technicalities are prepared in the template.
datasets.ImageFolder
: from torchvision import datasets, transforms train_data = datasets.ImageFolder('/local/temporary/Datasets/PACS_cartoon/train', transforms.ToTensor()) train_loader = torch.utils.data.DataLoader(train_data, batch_size=1, shuffle=True, num_workers=0)
mean=[0.485, 0.456, 0.406] std=[0.229, 0.224, 0.225]
train_loader
) and the loader used for validation (val_loader
). Use the sampler
argument of DataLoader
with SubsetRandomSampler. Use random subsets instead of just slicing the dataset: you should not assume that the dataset is randomly shuffled (and in this task it is really not).
We will investigate the benefits of using a pre-trained network, even if the distribution of our task is different from the pre-training (e.g. train on cartoons with a network pre-trained on photographs). First let's check the performance of a model trained on cartoons from scratch:
model = torchvision.models.resnet18(pretrained=False) model.to(dev)
optimizer
and nll_loss
as proposed in the template. When loading the data, move the data to GPU as well, note to(dev)
is not an in-place operation for Tensors, unlike for Modules.
val_loader
to evaluate the validation accuracy in the end of each training epoch. Select the model that achieves the best validation accuracy over all of the learning rates and training epochs. Save the best network using torch.save
. See Saving / Loading Tutorial .
test_data = datasets.ImageFolder('/local/temporary/Datasets/PACS_cartoon/test', transform) test_loader = torch.utils.data.DataLoader(test_data, batch_size=8, shuffle=False, num_workers=0)
model = torchvision.models.resnet18(pretrained=True) model.to(dev)
for param in model.parameters(): param.requires_grad = False
model.train(False)
will fix the behaviour of batchnorm and dropout layers (if present) to deterministic input-independent
requires_grad = True
by default, i.e. they will be trainable.
/local/temporary/Datasets/PACS_cartoon_few_shot
and it has significantly less training data. Report your findings and discuss the difference in terms of performance on the three trainings in PACS_cartoon versus the three trainings in PACS_cartoon_few_shot.
In PACS_cartoon_few_shot the training data are very limited. A good practice is to use data augmentations during training. Select some transforms, which can be expected to result in a more diverse dataset. A possible set is
See Pytorch transform examples.
Note that transforms inherit torch.nn.Module
and therefore can be used the same way as layers, or as functions applied to data Tensors (however, not batched). They can be also built-in the Dataset by setting the transform argument. They can process PIL.Image or a Tensor. For efficiently reasons it is better to use them as functions on Tensors.