Classification - Materials

In order to learn on your own, you should be able to program answers to questions below every topic. Some video materials are taken from Deep learning course of Andrew Ng from Stanford.

The video tutorials are meant to be followed step by step as they appear on this site. Use space to pause the videos, arrows to go back and forth in the video and implement explained stuff in your own code. Consult finished scripts after coding.

Mnist Classification (4.Lab)

Starting files and tutorials, how to work with basic pytorch

This tutorial provides insights about DataLoader Class. In next homeworks, you will be designing the dataloading procedure by yourself. You can use it to generate tensor shape suitable for your architecture.

We started with the classification Mnist template and data downloader of Mnist data. The completed files applying Linear Classifier provides the predictions of the digits.

Torch NN library (5.Lab)

Application of Torch NN library with pre-implemented machine learning functions. The usage is explained on the same problem (Mnist classification). The following videos explain the procedure: Mnist to nn library , Explanation of nn.Module

1) Implement your slightly adjusted convolutional network, that achieved superior performance. (more conv layers, different learning rate …)
2) Infer your model on Test set for final unbiased performance of the model.

Cifar10 Classification

Next task is to apply convolutional model on more complicated CIFAR10 Dataset download, which can be downloaded and loaded by following function (see cifar classification script, if problems with loading):

def load_pickle_file(path):
        with open(path, 'rb') as f:
                data = pickle.load(f)
        return data

The data consists of tuple of RGB images and corresponding labels.

CLASSES = {0 : airplane, 										
1 : automobile ,										
2 : bird ,										
3 : cat ,										
4 : deer ,										
5 : dog ,										
6 : frog ,										
7 : horse ,										
8 : ship ,										
9 : truck}

We provided the script Cifar Classification, where we explain the mechanism and ideas about the training of the neural network. The explanation is in the Cifar Classification Video

Go through the video and try to implement and play with explained ideas:

1) Get Dataloader to generate the shapes of input data for linear classifier and in dimension format for matplotlib.pyplot visualization: plt.imshow(data).

2) Apply different activation function in Conv_Block function. Eliminate maxpool layer by adjusting the input arguments of Conv2d for the same output dimension.

3) Design your own architecture, that surpass the provided baseline.

Computational Servers

To speed up the computation with so many data, you want to utilize GPU servers. Computing on GPU can speed up the training sometimes up to 30 times. It will be also necessary in HW3. You may debug and code in your local machine. If you want to code on the server itself, you may use the guide Setting remote Pycharm. Following site explains the useful properties of the servers Server info. We created video guide, how to connect to Server and GPUs.

1) Write bash file for importing the PyTorch module and exporting environment CUDA variables so you do not have to write it all the time each ssh session.

2) Write the python function for getting the Device, that you can use in future training. Try it on arbitrary torch tensor.

Convolutional Layers

Practical Examples of convolutional layers are explained in concise form by Andrew Ng here: Pooling layers, Simple Convolutional Network, Parameters and Computation. We advise to learn how to reproduce the patterns to get the intuition behind the convolutional neural networks. Something similar can occur in the test and will be exercised in labs.

Test your knowledge on the problem sets in Lab 5

1) Learn to be able to reproduce the kernel operations and number of parameters of the layers

2) Evaluate your knowledge on Lab 5 or one own created example in pytorch.

Input Normalization and Regularization

Important factors for learning are the input normalization and regularization. The weights as well as the data should be kept in similar boundaries, so the model won't overfit and learn useful features. Check the videos for Input Normalization and Batch norm: Why does batch norm work.

1) Implement batch norm in your network before each activation network besides the first one (first should be directly normalized by you). Do not forget to put model.eval() for validation and testing!

2) Implement 1-3 dropout layers along depth of your network for regularization. Do not forget to put model.eval() for validation and testing!

Here, we show how to use regularization techniques such as weight decay, dropout and batch norm regularization side-effect. This is usually another necessity for getting the reasonable generalization ability of the model.

Addition of weight decay in learning: $Loss = Training Loss + \lambda * \sum(w_i^2)$

Updating the weights after gradient calculation: $w_i^{iter+1} = w_i^{iter} - \alpha * (grad_i + 2 \lambda w_i)$

Hyperparameter Tunning

After designing the architecture, you need to find the set of working hyperparameters. Extensive search might not be possible. However, you can get techniques and explanations from video about Hyperparameter Tunning. We recommend to sample non-linearly the learning rate from <0.001 ; 0.03> and weight decay from <0.0001 ; 0.003>. The momentum can be added in SGD or keep default if you are using Adam optimizer. Batch size usually largest as possible to 512 samples (for perserving some of the regularization effect of batch normalization). When training with weight decay, you can observe performance increase even after +250 epochs. Be reasonable with the time spend on the training thought.

1) Take your implemented models and sample potential good hyperparameters.

2) Adjust your classification pipeline to iterate over the hyperparameters and try different configuration to find the best among the parameters.

Architecture Suggestions(6.Lab)

The idea about constructing the convolutional architectures are based on mathematical and technical constrains. The options of suitable architectures are practically unlimited. Therefore it is logical to build on the work of others, who come up with the functional models. You need to keep in mind general mechanism of deep learning, otherwise the networks might fail to train properly. The list of well known and systematic architectures are explained in the following Video. Feel free to skip the LeNet-5.

The following is the VGG Net architecture. It does not contain normalization layers, but was successful in classification task. You can achieve good performance, however it is harder to train - you need to search more in parameter space to find the good ones.

The next one is our design, let's call it corona-VIR-Net. It is based on similar idea of 3×3 kernel only convolutions, maxpools and relus. It has additional batch normalization layers, so it should be easier to train due to similar input values for convolutions. It converges faster than VGGnet due to number of parameters. We recommend to train it with weight decay regularization and SGD with default momentum. Search for the learning rate and weight decay parameters.

1) Implement VGG Net or corona-VIR-Net in the images. Use concise nn.Sequential to write the Convolutions. The networks are reasonable candidate for HW2 implementation.

2) Try recommended range of hyperparameters in the table below. Weight decay option can be set in SGD optimizer and applies L2 regularization on weights. In other words, it penalizes higher values of weights to prevent overfiting on specific input features. Let it overfits on a single batch for at least 50 epochs to show, if it learn something.

Table of Contents