Search
In order to learn on your own, you should be able to program answers to questions below every topic. Some video materials are taken from Deep learning course of Andrew Ng from Stanford.
Starting files and tutorials, how to work with basic pytorch
This tutorial provides insights about DataLoader Class. In next homeworks, you will be designing the dataloading procedure by yourself. You can use it to generate tensor shape suitable for your architecture.
We started with the classification Mnist template and data downloader of Mnist data. The completed files applying Linear Classifier provides the predictions of the digits.
Application of Torch NN library with pre-implemented machine learning functions. The usage is explained on the same problem (Mnist classification). The following videos explain the procedure: Mnist to nn library , Explanation of nn.Module
Next task is to apply convolutional model on more complicated CIFAR10 Dataset download, which can be downloaded and loaded by following function (see cifar classification script, if problems with loading):
def load_pickle_file(path): with open(path, 'rb') as f: data = pickle.load(f) return data
The data consists of tuple of RGB images and corresponding labels.
CLASSES = {0 : airplane, 1 : automobile , 2 : bird , 3 : cat , 4 : deer , 5 : dog , 6 : frog , 7 : horse , 8 : ship , 9 : truck}
We provided the script Cifar Classification, where we explain the mechanism and ideas about the training of the neural network. The explanation is in the Cifar Classification Video
To speed up the computation with so many data, you want to utilize GPU servers. Computing on GPU can speed up the training sometimes up to 30 times. It will be also necessary in HW3. You may debug and code in your local machine. If you want to code on the server itself, you may use the guide Setting remote Pycharm. Following site explains the useful properties of the servers Server info. We created video guide, how to connect to Server and GPUs.
Practical Examples of convolutional layers are explained in concise form by Andrew Ng here: Pooling layers, Simple Convolutional Network, Parameters and Computation. We advise to learn how to reproduce the patterns to get the intuition behind the convolutional neural networks. Something similar can occur in the test and will be exercised in labs.
Test your knowledge on the problem sets in Lab 5
Important factors for learning are the input normalization and regularization. The weights as well as the data should be kept in similar boundaries, so the model won't overfit and learn useful features. Check the videos for Input Normalization and Batch norm: Why does batch norm work.
Here, we show how to use regularization techniques such as weight decay, dropout and batch norm regularization side-effect. This is usually another necessity for getting the reasonable generalization ability of the model.
Addition of weight decay in learning: $Loss = Training Loss + \lambda * \sum(w_i^2)$
Updating the weights after gradient calculation: $w_i^{iter+1} = w_i^{iter} - \alpha * (grad_i + 2 \lambda w_i)$
After designing the architecture, you need to find the set of working hyperparameters. Extensive search might not be possible. However, you can get techniques and explanations from video about Hyperparameter Tunning. We recommend to sample non-linearly the learning rate from <0.001 ; 0.03> and weight decay from <0.0001 ; 0.003>. The momentum can be added in SGD or keep default if you are using Adam optimizer. Batch size usually largest as possible to 512 samples (for perserving some of the regularization effect of batch normalization). When training with weight decay, you can observe performance increase even after +250 epochs. Be reasonable with the time spend on the training thought.
The idea about constructing the convolutional architectures are based on mathematical and technical constrains. The options of suitable architectures are practically unlimited. Therefore it is logical to build on the work of others, who come up with the functional models. You need to keep in mind general mechanism of deep learning, otherwise the networks might fail to train properly. The list of well known and systematic architectures are explained in the following Video. Feel free to skip the LeNet-5.
The following is the VGG Net architecture. It does not contain normalization layers, but was successful in classification task. You can achieve good performance, however it is harder to train - you need to search more in parameter space to find the good ones.
The next one is our design, let's call it corona-VIR-Net. It is based on similar idea of 3×3 kernel only convolutions, maxpools and relus. It has additional batch normalization layers, so it should be easier to train due to similar input values for convolutions. It converges faster than VGGnet due to number of parameters. We recommend to train it with weight decay regularization and SGD with default momentum. Search for the learning rate and weight decay parameters.