Lab 3 - Backpropagation

In this lab, we are going to test your knowledge about the math behind neural networks. These simple exercises are application of theory in lecture Lecture (slides 10-62). Tutorial solution to the first two exercises is here. We recommend to consult corresponding lecture parts for better overview and intuition behind these exercises. Another (graphically rich if preferred) source about learning mechanisms is here. It provides step-by-step description of feed-forward and backward pass https://hmkcode.com/ai/backpropagation-step-by-step/.

You are asked to write python code to validate your results. It takes only few lines and tests your understanding. You can also switch initial values and dimension of parameters to verify your ability to solve different problems. This part is important because you verify your ability to apply and also make you conscious what you are doing in the following parts of the subject. If your are not sure about syntax in Pytorch, you can look at the documentation Pytorch and search for modules.

Simple Neural Network

You are given the following neural network model parametrized by weight vector w. Model takes as a input vector x and outputs y: $$ y = sin(\textbf{w}^T~\textbf{x}) - b $$

Where: $$\textbf{x} = [2, 1] , ~\textbf{w} = [\pi/2, \pi] ,~ b = 0 ,~ \tilde{y} = 2$$
1) Draw a computational graph of forward pass of this small neural network
2) Compute feedforward pass with initial weights w and input data feature x
3) Calculate gradients of output y with respect to w, i. e $\frac{\partial y}{\partial \textbf{w}}$
4) Use $L_2$ loss (Mean square error) to compute loss value between forward prediction y and label $\tilde{y}$. Add loss into computational graph.
5) Use chain rule to compute the gradient $\frac{\partial L}{\partial \textbf{w}}$ and update weights with learning rate parameter $\alpha$ = 0.5

import torch
import numpy as np
 
### Define initial parameters
# w =
# x =
# b =
# y_label =
 
""" Note: Think about dimensions of initial parameters and order of operations """
 
# model forward pass y = sin(w.T @ x) - b        ---> dot product @
# y =
 
# calculate loss and make backward pass
# L2 =
 
""" Note: Beware of backward passes when calculating it for both y and L. You need to do it separately """
 
# Update weights with learning rate alpha
# alpha =
 
print(f"2): Feed-forward pass result is : {''}")
print(f"3): Weight gradients are : {''}")
print(f"4): L2 loss result is : {''}")
print(f"5): Updated weights are : {''}")

Convolutional Layer

You are given input feature map x and kernel w: $$ \textbf{x} = \left(\begin{array}{ccc} 1 & 0 & 2 \\ 2 & 1 & -1 \\ 0 & 0 & 2 \\ \end{array}\right) ,~ \textbf{w} = \left(\begin{array}{cc} 1 & -1 \\ 0 & 2 \end{array}\right) $$
Stride denotes length of convolutional stride, padding denotes symetric zero-padding.
Compute outputs of following layers:

$$1)~ conv(~\textbf{x}, \textbf{w}, ~stride=1, ~padding=0)~=$$

$$2)~ conv(~\textbf{x}, \textbf{w}, ~stride=3, ~padding=1)~=$$

$$3)~ max(~\textbf{x},~ 2~ x~ 2)~=$$

import torch
 
### convolutions
x = torch.tensor(((1,0,2), (2,1,-1), (0,0,2)), dtype=torch.float)
w = torch.tensor(((1,-1),(0,2)), dtype=torch.float)
 
### 1)
conv1 = torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2, stride=1, padding=0, dilation=2)
print(conv1)
print('old weights:', conv1.weight, 'weight shape:', conv1.weight.shape)
# insert weights into convolution
 
print('new weights:', conv1.weight)
 
output_1 = conv1(x.unsqueeze(0).unsqueeze(0))
 
""" Hint: input into 2d convolution is 4 dimensional tensor. First dim denotes batch size of input data, 
second dim are channels for corresponding weight forward pass, third and fourth are dimension of the data.
More on this in next lab, this is just brief explanation, why we have to unsqueeze input for convolution operation"""
 
### 2)
# conv2 = 
# Here insert weights into convolution
 
output_2 = conv2(x.unsqueeze(0).unsqueeze(0))
 
### 3)
# maxpool =
 
output_3 = maxpool(x.unsqueeze(0).unsqueeze(0))
 
print(f'\n\noutput_1: {output_1}')
print(f'output_2: {output_2}')
print(f'output_3: {output_3}')

Task 1

You are given a network which consists of the convolutional layer. Structure is defined as follows:
$$f(\textbf{x},\textbf{w}) = max(~conv(\textbf{x},\textbf{w}, ~stride=1,~ padding=0),~ 1 ~x~ 2)$$

Where:

$$\textbf{x} = \left(\begin{array}{ccc} 2 & 1 & 2 \end{array}\right) ,~ \textbf{w} = \left(\begin{array}{cc} 1 & 0 \end{array}\right)$$ where: x is a input feature map (image 1×3) and w is a two dimensional convolutional kernel with size 1×2

1) Draw computational graph and compute the feed-forward pass for input data x and kernel w
2) Estimate gradient with respect to w, i.e. $\frac{\partial f(\textbf{x}, \textbf{w})}{\partial \textbf{w}}$
3) Update weights using learning rate $\alpha = 0.5$

Task 2

You are given the following computational graph, where W $\in R^{4×6}$, u $\in R^{4×1}$, y $\in R^{17×1}$ and x denotes homogeneous coordinates and w = vec(W).

1) Fill in dimensionality of x, w, z, q, $\mathcal{L}$ variables in the computational graph.
2) Fill in dimensionality of the following edge gradients $\frac{\partial \mathcal{L}}{\partial q}, \frac{\partial q}{\partial z}, \frac{\partial z}{\partial w}$.
3) What is dimensionality of $\frac{\partial \mathcal{L}}{\partial \textbf{y}}$
4) What is dimensionality of $\frac{\partial \mathcal{L}}{\partial \textbf{w}}$

MNIST classification with torch.nn

Pytorch offers a powerful tools in form of the nn library. It provides templates of classes for deep learning applications and makes developing of python code more easier and compact. You will be using this library in next parts of this subject for every deep learning application. You can follow our prepared tutorial overview (is Split into sequences because of video processing malfunction) or go at your own pace.

Learning these tools is the most effective to learn by yourself according to This tutorial. It introduces few important python classes, that take care of the math in this course. These classes are quite readable and scalable to your needs.

 #The most important classes are: 
torch.nn.Module
torch.optim
torch.utils.data.Dataset 
torch.utils.data.DataLoader

Your task here is to go through This tutorial and learn classification neural network (default or come up with your own architecture) on MNIST dataset. MNIST dataset contains labelled hand-written digit numbers (0-9). At first, follow the tutorial without usage of nn library. Be sure to understand each individual step and why it is important. Then apply classes from nn to refactor your code for more readability and configuration. Try to achieve better than default accuracy (~83%) on validation set by adding few additional layers into nn.Module or changing hyperparameters such as learning rate.

Table of Contents