====== Lab 5 ======


==== 1) Convolutional Layer ====

You are given input feature map **x** and kernel **w**:
$$
\textbf{x} = \left(\begin{array}{ccc} 
1 & 0 & 2 \\
2 & 1 & -1 \\
0 & 0 & 2 \\
\end{array}\right) ,~
\textbf{w} = \left(\begin{array}{cc}
1 & -1 \\
0 & 2
\end{array}\right)
$$
\\
Stride denotes length of convolutional stride, padding denotes symetric zero-padding, max denotes maxpool layer (takes maximum over the kernel window).  \\
Compute outputs of following layers:

$$1)~ conv(~\textbf{x}, \textbf{w}, ~stride=1, ~padding=0)~=$$

$$2)~ conv(~\textbf{x}, \textbf{w}, ~stride=3, ~padding=1)~=$$

$$3)~ maxpool2d(~\textbf{x},~ 1~ x~ 2)~=$$

\\

<code python>
import torch

### convolutions
x = torch.tensor(((1,0,2), (2,1,-1), (0,0,2)), dtype=torch.float)
w = torch.tensor(((1,-1),(0,2)), dtype=torch.float)

### 1) 
# Init the convolutional layer class
conv1 = torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=2, stride=1, padding=0, dilation=2)
print(conv1)
print('old weights:', conv1.weight, 'weight shape:', conv1.weight.shape)

# insert weights into convolution, wrap the weight tensor to torch.nn.Parameter
# in order to put it into convolutional layer. Consult pytorch documentations, if necessary.

print('new weights:', conv1.weight)

output_1 = conv1(x.unsqueeze(0).unsqueeze(0))

""" Hint: input into 2d convolution is 4 dimensional tensor. First dim denotes batch size of input data, 
second dim are channels for corresponding weight forward pass, third and fourth are dimension of the data.
More on this in next lab, this is just brief explanation, why we have to unsqueeze input for convolution operation"""

### 2)
# conv2 = 
# Here insert weights into convolution

output_2 = conv2(x.unsqueeze(0).unsqueeze(0))

### 3)
# maxpool =

output_3 = maxpool(x.unsqueeze(0).unsqueeze(0))

print(f'\n\noutput_1: {output_1}')
print(f'output_2: {output_2}')
print(f'output_3: {output_3}')

</code>


==== 2) Computational graph of Convolution ====

You are given a network which consists of the convolutional layer. Structure is defined as follows: \\
$$f(\textbf{x},\textbf{w}) = maxpool1d(~conv(\textbf{x},\textbf{w}, ~stride=1,~ padding=0),~ 1 ~x~ 2)$$

Where:

$$\textbf{x} = \left(\begin{array}{ccc}
2 & 1 & 2
\end{array}\right) ,~
\textbf{w} = \left(\begin{array}{cc}
1 & 0
\end{array}\right)$$
where: **x** is a input feature map (image 1x3) and **w** is a two dimensional convolutional kernel with size 1x2

\\

1) Draw computational graph and compute the feed-forward pass for input data **x** and kernel **w** \\
2) Estimate gradient with respect to **w**, i.e. $\frac{\partial f(\textbf{x}, \textbf{w})}{\partial \textbf{w}}$ \\
3) Update weights using learning rate $\alpha = 0.5$ \\

==== 3) Jacobians in computational graph ====

You are given the following computational graph, where **W** $\in R^{4×6}$, **u** $\in R^{4×1}$, **y** $\in R^{17×1}$ and **x** denotes homogeneous coordinates and **w** = vec(**W**). \\
\\ 
1) Fill in dimensionality of **x**, **w**, **z**, **q**, $\mathcal{L}$ variables in the computational graph. \\
2) Fill in dimensionality of the following edge gradients $\frac{\partial \mathcal{L}}{\partial q}, \frac{\partial q}{\partial z}, \frac{\partial z}{\partial w}$. \\
3) What is dimensionality of $\frac{\partial \mathcal{L}}{\partial \textbf{y}}$ ?\\
4) What is dimensionality of $\frac{\partial \mathcal{L}}{\partial \textbf{w}}$ ?

\\
\\
{{ :courses:b3b33vir:tutorials:dim.png?600 |}}


==== 4) Kernel Computation  ====
Consider a convolutional neural network layer $l_1$ which maps a //RGB// image of size 128x128 to 16 feature maps having the same spatial dimensions as the image. The kernel size is 3 x 3 and uses a stride 1.


  -  What is the size of padding, which ensure the same spatial resolution of the output feature map? \\
  -  How much memory (in bytes) do kernel weights in layer $l_1$ take up, assuming //float32// weights? Ignore the bias weights. \\
  - How many mathematical operations does this layer perform for a single forward pass?  A multiplication or addition of two numbers is considered as one operation. \\