Exercise 4

Program:

Revision of backpropagation (BP)
Implementation of BP for simple NN

Downloads:

Wedge dataset

Likely, you will not manage to finish the implementation of all the functions during the excercise in the lab. Finish them as a home work.

Revision

Refresh your knowledge of

the steepest descent algorithm (what do we need to compute to be able to modify the model parameters?) and
the final form of BP algorithm (not its derivation, just how to use it).

Problem description

We have 2 classes of objects (blue circles and red crosses) described in 2D space:

We would like to train a neural network which would be able to distinguish between the two classes.

Before reading further, try to answer the following question:

Is one simple perceptron unit sufficient to classify the training points precisely?
How many linear boundaries do we need here?
What architecture must the network have to be suitable for this problem?
We say that “we need to tune the NN to the problem”. What is actually tuned?

Designing the network

Our network will look like this:

Let's choose the transfer function g to be the same for all neurons and equal to

<latex> $$g(a) = \frac{1}{1+e^{-\lambda a}}$$ </latex>

To reiterate, its derivative in point a can be computed as

<latex> $$g'(a) = \lambda g(a)(1-g(a))$$ </latex>

How many weights must be tuned in our network???

Forward propagation

Given an input vector x, how do we compute the outputs of each of the neurons?

The weights related to neurons 1, 2 and 3 form weight vectors

<latex> $ \mathbf{w}_1 = (w_{11},w_{12},w_{13})^T$
$ \mathbf{w}_2 = (w_{21},w_{22},w_{23})^T$
$ \mathbf{w}_3 = (w_{31},w_{32},w_{33})^T$
</latex>

For the given inputs x₁ and x₂, we create a vector x in homogeneous coordinates:

<latex> $\mathbf{x} = (x_1, x_2, 1).$ </latex>

Then we can compute the weighted sums a₁ and a₂ for neurons 1 and 2:

<latex> $$a_1 = \mathbf{w}_1^T \mathbf{x}, $$
$$a_2 = \mathbf{w}_2^T \mathbf{x}, $$
</latex>

and we can also compute the outputs z₁ and z₂ of neurons 1 and 2:

Then, we can construct the input vector for the third neuron as

<latex> $$ \mathbf{x}_H = (z_1, z_2, 1)^T $$ </latex>

and pass it through the third neuron

<latex> $$a_3 = \mathbf{w}_3^T \mathbf{x}_H,\text{ and} $$
$$z_3 = o = g(a_3).$$ </latex>

This way we have gained the values of all the z_i in the whole network.

Steepest descent

Since our network has only one output, the error it makes is defined as

Since the output o is actually a function of weights o(w), we can optimize the error function E by modifying the weights. We shall use the gradient descent method. The update rule of the gradient descent method is

<latex> $$\mathbf{w} \leftarrow \mathbf{w} - \eta \cdot \mathrm{grad}(E(\mathbf{w})) </latex> ===== Error backpropagation ===== How do we compute the gradient? The gradient in point **w** can be computed by using the backprop. algorithm. In our case, the gradient can be written as a matrix (similarly as **w**): <latex> $$\mathrm{grad}(E(\mathbf{w})) = \left(

\begin{array}{lll}
	\frac{\partial E}{\partial w_{11}} & \frac{\partial E}{\partial w_{12}} & \frac{\partial E}{\partial w_{13}} \\
	\frac{\partial E}{\partial w_{21}} & \frac{\partial E}{\partial w_{22}} & \frac{\partial E}{\partial w_{23}} \\
	\frac{\partial E}{\partial w_{31}} & \frac{\partial E}{\partial w_{32}} & \frac{\partial E}{\partial w_{33}} \\
\end{array}
\right)$$

</latex>

From the derivation of the backpropagation algorithm, we know that the individual derivatives can be computed as

<latex> $$\frac{\partial E}{\partial w_{ji}} &= \delta_j z_i,$$ </latex>

where <latex>$\delta_j$</latex> is the error of the neuron on the output of the edge <latex>$i\rightarrow j$</latex>, and <latex>$z_i$</latex> is the signal on the input of the edge <latex>$i\rightarrow j$</latex>. Values of <latex>$z_i$</latex> are known from the forward propagation phase, we need to compute all the <latex>$\delta_j$</latex>.

Again, from the derivation of the backpropagation algorithm we know, that for the output layer we can write

<latex> $$\delta_3 = \underbrace{g'(a_3)}_{\lambda z_3(1-z_3)}\frac{\partial E}{\partial o} = \lambda z_3(1-z_3)\cdot(-1)(y-o)$$ </latex>

For the <latex>$\delta_j$</latex> in the hidden layer, we can write

<latex> $$\delta_j = g'(a_j)\sum_{k \in \mathrm{Dest}(j)}w_{kj}\delta_k$$ </latex>

Consider neuron 1. The set of neurons which are fed with signal from neuron 1 is <latex>$\mathrm{Dest}(1) = {3}$</latex>, so that the sum will have only one element:

<latex> $$\delta_1 = g'(a_1)\sum_{k \in \mathrm{Dest}(1)}w_{k1}\delta_k = \lambda z_1(1-z_1)w_{31}\delta_3$$ </latex>

Similarly, for the neuron 2 we can write

<latex> $$\delta_2 = \lambda z_2(1-z_2)w_{32}\delta_3 $$ </latex>

That completes the backpropagation algorithm for our particular network. We know all the <latex>$z_i$</latex> from the forward propagation, we have computed all the <latex>$\delta_j$</latex> using the backpropagation algorithm, we can thus compute the gradient of the error function <latex>$\mathrm{grad}(E(\vw))$</latex> and make the gradient descent step <latex>$\mathbf{w} \leftarrow \mathbf{w} - \eta\cdot \mathrm{grad}(E(\mathbf{w}))$</latex>.

Implementation

Your task is to implement the above algorithm in MATLAB.

You can start with a helper function perc which will realize the forward propagation phase for 1 unit. Assume the sigmoidal transfer function. This function should be general and should take weight an input vectors of any dimensionality.

function z = perc(w, x, lambda)

Inputs	`w`	[D+1 x 1]	Vector of weights of a neuron.
	`x`	[D+1 x N]	Data set that should be propagated through the neuron.
	`lambda`	scalar	The steepness of the sigmoidal function.
Outputs	`z`	[1 x N]	The outputs of the neuron for each input point.

Then, implement a function that will realize 1 step of the gradient descent algorithm for out particular network (i.e. it will be a specific function for our network):

function Wn = step(W, x, y)

Inputs	`W`	[3 x 3]	Matrix of old values of weights in our network.
	`x`	[2 x N]	Values of independent variables of our training data.
	`y`	[1 x N]	Values of dependent variable of our network.
Outputs	`Wn`	[3 x 3]	The new updated values of weights in our network.

Due to the chosen transfer function, the two classes in vector y should be coded as 0 and 1.

Finally, by calling the step function iteratively for a few hundreds epochs, train the neural network to the wedge data set.

What is the meaning of the weights in the weight matrix?
Can you plot them somehow into the data?

Table of Contents