Lab 3 - MLE, Computational graph and Backpropagation

In this lab, we are going to test your knowledge about the math behind neural networks. These simple exercises are application of theory in lecture Lecture (slides 10-62). Tutorial solution to the first two exercises is here. We recommend to consult corresponding lecture parts for better overview and intuition behind these exercises. Another (graphically rich if preferred) source about learning mechanisms is here. It provides step-by-step description of feed-forward and backward pass https://hmkcode.com/ai/backpropagation-step-by-step/.

You are asked to write python code to validate your results. It takes only few lines and tests your understanding. You can also switch initial values and dimension of parameters to verify your ability to solve different problems. This part is important because you verify your ability to apply and also make you conscious what you are doing in the following parts of the subject. If your are not sure about syntax in Pytorch, you can look at the documentation Pytorch and search for modules.

Simple Neural Network

You are given the following neural network model parametrized by weight vector w. Model takes as a input vector x and outputs y: $$ y = \sin(\textbf{w}^T~\textbf{x}) - b $$

Where: $$\textbf{x} = [2, 1] , ~\textbf{w} = [\pi/2, \pi] ,~ b = 0 ,~ \tilde{y} = 2$$
1) Draw a computational graph of forward pass of this small neural network
2) Compute feedforward pass with initial weights w and input data feature x
3) Calculate gradients of output y with respect to w, i. e $\frac{\partial y}{\partial \textbf{w}}$
4) Use $L_2$ loss (Mean square error) to compute loss value between forward prediction y and label $\tilde{y}$. Add loss into computational graph.
5) Use chain rule to compute the gradient $\frac{\partial L}{\partial \textbf{w}}$ and update weights with learning rate parameter $\alpha$ = 0.5

import torch
import numpy as np
 
### Define initial parameters
# w =
# x =
# b =
# y_label =
 
""" Note: Think about dimensions of initial parameters and order of operations """
 
# model forward pass: use torch.sin() and w @ x        ---> dot product @
# y =
 
# calculate loss and make backward pass
# L2 =
 
""" Note: Beware of backward passes when calculating it for both y and L. You need to do it separately """
 
# Visualize gradient of L2 wrt w: use L2.backward(), then you can see gradient in w.grad

Maximum Likelihood Estimate

You are given Gaussian probability distribution model $$p(y|\mathbf{x},\mathbf{w}) = K\cdot\exp(-(y-f(\mathbf{x},\mathbf{w}))^2),$$ which models the probability of observing variable $y\in\mathbb{R}$, given measurement $\mathbf{x}\in\mathbb{R}$. The shape of probability distribution is determined by (unknown) parameters $\mathbf{w}_0, \mathbf{w}_1\in\mathbb{R}$ of non-linear function $$f(\mathbf{x},\mathbf{w}) = \frac{1}{1+e^{-(\mathbf{w}_0*\mathbf{x}_i + \mathbf{w}_1)}},$$ You are given a training set $\mathcal{D} = \{(\mathbf{x}_1, y_1)\dots (\mathbf{x}_N, y_N)\}$.

Write down the optimization problem, which corresponds to the maximum likelihood estimate of unknown parameters $\mathbf{w}$ and simplify the resulting loss if possible.
Download template and fill in the learning loop (loss function, gradient loss.backward() and weight update rule).
Find reasonable learning rate. What happens, when learning rate is too big / too small?
Is the least square formulation (LSQ) equivalent to the maximum likelihood formulation(MLE)? What is not equivalent?
What are the necessary assumption, which allows for MLE and LSQ formulation?

import torch
import matplotlib.pyplot as plt
import numpy as np
 
 
# load points
N = 5
pts = np.load('pts.npy')
pts= torch.tensor(pts)
 
# define optimization variables
w = torch.tensor([-2, 2], requires_grad=True, dtype=torch.double)
 
for i in range(30):
    # OPTIMIZE WEIGHTS ...
    # (1) define loss
    # (2) compute gradient loss.backward()
    # (3) update weights
 
    loss = ...
    loss.backward()
    with torch.no_grad():
        w -= learning_rate * w.grad
    w.grad.zero_()
 
 
    # visualize result
    PTS = pts.detach().numpy() # convert to numpy
    W = w.detach().numpy()
    T = torch.linspace(-1, 1, 50).numpy()
    plt.figure(1), plt.clf()
    plt.plot(PTS[:, 0], PTS[:, 1], markersize=10, marker='x', color='r', linestyle='None')
    plt.plot(T, 1/(1-np.exp(W[0] * T - W[1])), color='green')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.pause(0.01)
    plt.draw()

Table of Contents

Lab 3 - MLE, Computational graph and Backpropagation

Simple Neural Network

Maximum Likelihood Estimate