This page is located in archive. Go to the latest version of this course pages. Go the latest version of this page.

Convolutional Neural Networks

Deep Convolutional Neural Networks (CNNs) re-entered into the computer vision community, especially after the breakthrough paper of Krizhevsky et al. [1] that presented a large scale image category recognition with a remarkable success. In 2012, the CNN-based algorithm outperformed competing teams from many renowned institutions by a significant margin. This success initiated an enormous interest in neural networks in computer vision, to the extent that most successful methods are using neural networks nowadays.

The convolutional network is an extremely flexible classifier that is capable of fitting on very complex recognition/regression problems with a good generalization ability. The network consists of a nested ensemble of non-linear functions. The network is usually deep, i.e. it has many layers. Typically it has more parameters than number of data samples in the training set. There are mechanism to prevent overfitting. One of the basic tricks is leveraging the convolutional layers. The network learns shift-invariant filters instead of individual weights on every input pixel. Thus much fewer parameters are required, since the weights are shared.

cnn_arch.jpg Fig. 1: Architecture of a Deep Convolutional Neural Network. Figure adapted from [1].

Usually, the architecture of an image classification CNN is composed of several convolutional layers (which are meant to learn a representation) followed by a few fully connected layers (which implement the non-linear classification stage on top of the invariant representation), see figure 1.

In this lab, you will train your own network for image classification from scratch. We will be using pytorch library for that.

Training CNN for image classification

In this lab we will train convolution neural network for image classification from scratch. It is typically done on GPUs, often multiple ones, as the process is very computationally intensive. The current assignment, on the other hand, is created to be run on a CPU: a single training epoch takes 1-5 minutes on CPU, depending on CNN architecture and hardware.

Specifically, it takes 90 sec for training epoch on mobile i7 CPU and 26 sec on mobile GT940M GPU.

You will implement a CNN, training and validation loop, custom dataset loader and a couple of helper functions.

Download and installation

First, make sure that you have conda virtual environment set-up. If not, create one from here https://gitlab.fel.cvut.cz/mishkdmy/mpv-python-assignment-templates/-/tree/master/conda_env_yaml

Second, pull the assignment template from here https://gitlab.fel.cvut.cz/mishkdmy/mpv-python-assignment-templates/-/blob/master/assignment_6_7_cnn_template/training-imagenette-CNN.ipynb.

You can download the data via resp. section of the notebook or directly from https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz

The task

For this lab all explanations are contained in the corresponding notebook. Please, refer there.

Introduction into PyTorch Image Processing.

To fulfil this assignment, you need to submit these files (all packed in one .zip file) into the upload system:

  • results.html - Converted notebook. Transform your notebook training-imagenette-CNN.ipynb into file results.html using jupyter nbconvert -–to html training-imagenette-CNN.ipynb -–EmbedImagesPreprocessor.resize=small -–output results.html
  • submission.csv File with predictions on the test set.
  • cnn_training.py - file with the following methods implemented:
    • get_dataset_statistics - function to calculate dataset mean and standard deviation (per pixel)
    • SimpleCNN, - class, which implements CNN
    • weight_init - function, which initializes CNN
    • validate, - function for performing a validation
    • train_and_val_single_epoch, - function for training CNN on single pass through train data loader
    • lr_find, - function for run a small training for finding optimal learning rate for training *
    • TestFolderDataset – Class, which reads images in folder and serves as test dataset
    • get_predictions – Function, which predicts class indexes for image in data loader

Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root. Please upload the same zip to the two tasks: 01_cnnclf and 02_tourn.

Your code and notebook will be checked manually. submission.csv will be used for two things. First, for the evaluation of quality of your trained network (task 08_cnnclf). Second, you will get bonus points based on its performance: 1 point for being in 20%-quantile. E.g. top-20% will bring you 5 points, top40%: 4 points and so on. In order to get any points your classification accuracy should be >70%.

Debugging PyTorch training code

This lab does not have an assignment. The slides are here


  1. A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, 2012. PDF
  2. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998. PDF

Jan Čech 2016/04/26 17:07

courses/mpv/labs/5_convolutional_networks/start.txt · Last modified: 2023/04/26 15:14 by mishkdmy