Search
In this lab, we are going to test your knowledge about the math behind neural networks. These simple exercises are application of theory in lecture Lecture (slides 10-62). Tutorial solution to the first two exercises is here. We recommend to consult corresponding lecture parts for better overview and intuition behind these exercises. Another (graphically rich if preferred) source about learning mechanisms is here. It provides step-by-step description of feed-forward and backward pass https://hmkcode.com/ai/backpropagation-step-by-step/.
You are given the following neural network model parametrized by weight vector w. Model takes as a input vector x and outputs y: $$ y = \sin(\textbf{w}^T~\textbf{x}) - b $$
Where: $$\textbf{x} = [2, 1] , ~\textbf{w} = [\pi/2, \pi] ,~ b = 0 ,~ \tilde{y} = 2$$ 1) Draw a computational graph of forward pass of this small neural network 2) Compute feedforward pass with initial weights w and input data feature x 3) Calculate gradients of output y with respect to w, i. e $\frac{\partial y}{\partial \textbf{w}}$ 4) Use $L_2$ loss (Mean square error) to compute loss value between forward prediction y and label $\tilde{y}$. Add loss into computational graph. 5) Use chain rule to compute the gradient $\frac{\partial L}{\partial \textbf{w}}$ and update weights with learning rate parameter $\alpha$ = 0.5
import torch import numpy as np ### Define initial parameters # w = # x = # b = # y_label = """ Note: Think about dimensions of initial parameters and order of operations """ # model forward pass: use torch.sin() and w @ x ---> dot product @ # y = # calculate loss and make backward pass # L2 = """ Note: Beware of backward passes when calculating it for both y and L. You need to do it separately """ # Visualize gradient of L2 wrt w: use L2.backward(), then you can see gradient in w.grad
You are given Gaussian probability distribution model $$p(y|\mathbf{x},\mathbf{w}) = K\cdot\exp(-(y-f(\mathbf{x},\mathbf{w}))^2),$$ which models the probability of observing variable $y\in\mathbb{R}$, given measurement $\mathbf{x}\in\mathbb{R}$. The shape of probability distribution is determined by (unknown) parameters $\mathbf{w}_0, \mathbf{w}_1\in\mathbb{R}$ of non-linear function $$f(\mathbf{x},\mathbf{w}) = \frac{1}{1+e^{-(\mathbf{w}_0*\mathbf{x}_i + \mathbf{w}_1)}},$$ You are given a training set $\mathcal{D} = \{(\mathbf{x}_1, y_1)\dots (\mathbf{x}_N, y_N)\}$.
loss.backward()
import torch import matplotlib.pyplot as plt import numpy as np # load points N = 5 pts = np.load('pts.npy') pts= torch.tensor(pts) # define optimization variables w = torch.tensor([-2, 2], requires_grad=True, dtype=torch.double) for i in range(30): # OPTIMIZE WEIGHTS ... # (1) define loss # (2) compute gradient loss.backward() # (3) update weights loss = ... loss.backward() with torch.no_grad(): w -= learning_rate * w.grad w.grad.zero_() # visualize result PTS = pts.detach().numpy() # convert to numpy W = w.detach().numpy() T = torch.linspace(-1, 1, 50).numpy() plt.figure(1), plt.clf() plt.plot(PTS[:, 0], PTS[:, 1], markersize=10, marker='x', color='r', linestyle='None') plt.plot(T, 1/(1-np.exp(W[0] * T - W[1])), color='green') plt.xlabel('x') plt.ylabel('y') plt.pause(0.01) plt.draw()