In this labs we will build a logistic regression model for a letter (digit) classification task. It has been shown at the lecture that for many practical problems, the *log odds* $a(x)$ can be written as a linear function of the input $x$
$$
a(x) = \ln\frac{p_{K|x}(1|x)}{p_{K|x}(2|x)} = w\cdot x\;,
$$
where the original data vector $x$ has been augmented by adding 1 as its first component (corresponding to the bias in the linear function). It follows that the aposteriori probabilities have then the form of a sigmoid function acting on a linear function of $x$
$$
p_{K|x}(k|x) = \frac{1}{1+e^{-kw\cdot x}} = \sigma(kw\cdot x)\qquad k\in\{-1, +1\}.
$$
The task of estimating the aposteriori probabilities then becomes a task of estimating their parameter $w$. This could be done using the maximum likelihood principle. Given a training set $\mathcal{T} = \{(x_1, k_1), \ldots, (x_N, k_N)\}$, ML estimation leads to the following problem:
$$
w^* = \arg\min_w E(w), \qquad \mbox{where}\\
E(w) = \frac{1}{N}\sum_{(x,k) \in \mathcal{T}}\ln(1+e^{-kw\cdot x})\;,
$$
The normalisation by the number of training samples helps to make the energy, and thus the gradient step size, independent of the training set size.

We also know that there is no close-form solution to this problem, so a numerical optimisation method has to be applied. Fortunately, the energy is a convex function, so any convex optimisation algorithm could be used. We will use the gradient descent method in this labs for its simplicity, however, note that many faster converging methods are available (e.g. the heavy-ball method or the second order methods).

General information for Python development.

To fulfil this assignment, you need to submit these files (all packed in a single `.zip`

file) into the upload system:

- a script for data initialization, calling of the implemented functions and plotting of their results (for your convenience, will not be checked)`logreg.ipynb`

- containing the following implemented methods:`logreg.py`

- a function which computes the logistic loss`logistic_loss`

- a function for computing the logistic loss gradient`logistic_loss_gradient`

- gradient descent on logistic loss`logistic_loss_gradient_descent`

- analytical decision boundary solution for the 1D case`get_threshold`

- given the log. reg. parameters, classifies the input data into {-1, +1}.`classify_images`

,`E_progress_AC.png`

,`w_progress_2d_AC.png`

,`aposteriori.png`

,`classif_AC.png`

,`E_progress_MNIST`

- images specified in the tasks`classif_MNIST.png`

** Use template of the assignment.** When preparing a zip file for the upload system, **do not include any directories**, the files have to be in the zip file root.

The first task is to classify images of letters 'A' and 'C' using the following measurement:

x = normalise( (sum of pixel values in the left half of image) -(sum of pixel values in the right half of image))as implemented in

`compute_measurements(imgs, norm_parameters)`

**Tasks**

- Load the data from
`data_logreg.npz`

. It contains`trn`

and`tst`

dictionaries containing training and test data and their labels respectively. Augment the data so that their first component is 1. - Complete the function
which computes the energy $E(w)$ given the data $X$, their labels $y$ and some solution vector $w$.`loss = logistic_loss(X, y, w)`

loss = logistic_loss(np.array([[1, 1, 1],[1, 2, 3]]), np.array([1, -1, -1]), np.array([1.5, -0.5])) print(loss) # -> 0.6601619507527583

- Implement the function
which returns the energy gradient $g$ given some vector $w$.`g = logistic_loss_gradient(X, y, w)`

**Hint:**Do not forget to normalise the gradient by 1/N.\\gradient = logistic_loss_gradient(np.array([[1, 1, 1],[1, 2, 3]]), np.array([1, -1, -1]), np.array([1.5, -0.5])) print(gradient) # -> [0.28450597 0.82532575]

- Next, implement the function
which does the gradient descent search for the optimal solution as described in the lecture slide 21 slides. As the terminal condition use $||w_\mbox{new} - w_\mbox{prev}|| \le \mbox{epsilon}$.`w, wt, Et = logistic_loss_gradient_descent(X, y, w_init, epsilon)`

**Hint 1:**Do not forget to add the bias component to the data.

**Hint 2:**Try to initialise it from different initial values of $w$ to test if it behaves as you would expect.

**Hint 3:**Save the candidate into $wt$ immediatelly after finding it. The distance between the last two elements of $wt$ will thus be $||wt_\mbox{last} - wt_\mbox{prelast}|| \le \mbox{epsilon}$!w_init = np.array([1.75, 3.4]) epsilon = 1e-2 [w, wt, Et] = logistic_loss_gradient_descent(train_X, trn['labels'], w_init, epsilon) print('w: ', w, '\nwt: ', wt, '\nEt: ', Et) # -> w: [1.8343074 3.3190428] # -> wt: [[1.75 1.75997171 1.77782062 1.80593719 1.83779008 1.84398609 1.8343074 ] # -> [3.4 3.39233974 3.37848624 3.35614129 3.32899079 3.31724604 3.3190428 ]] # -> Et: [0.25867973 0.25852963 0.25830052 0.25804515 0.25791911 0.25791612 0.25791452]

- Visualise the progress of the gradient descent:
- Plot the energy $E(w)$ in each step of the gradient descent. Save the figure into
.`E_progress_AC.png`

- Plot the progress of the value of $w$ in 2D (slope + bias parts of $w$) during the optimisation. For the progress of $w$ in 2D you may use the provided function
. Save the images as`plot_gradient_descent(X, y, loss_function, w, wt, Et)`

.`w_progress_2d_ac.png`

- Derive the analytical solution for the classification threshold on the (1D) data, i.e. find the x for which the aposteriori probabilities are both 0.5. Using the formula, complete the function
. Plot the threshold, aposteriori probabilities $p(A|x)$ and $p(C|x)$ and the training data into one plot and save the image as`thr = get_threshold(w)`

.`aposteriori.png`

thr = get_threshold(np.array([1.5, -0.7])) print(thr) # -> 2.142857142857143

- Complete the function
which uses the learned $w$ to classify the data in $X$.`y = classify_images(X, w)`

y = classify_images(np.array([[1, 1, 1, 1, 1, 1, 1],[0.5, 1, 1.5, 2, 2.5, 3, 3.5]]), np.array([1.5, -0.5])) print(y) # -> [ 1 1 1 1 1 -1 -1]

- Use the function
`classify_images`

to classify the letters in the`tst`

set. Compute the classification error and visualise the classification. Save the visualised classification results as.`classif_ac.png`

**Hint:**Use the data normalisation returned for the training data.

Next we apply the method to a similar task of digit classification. However, instead of using a one-dimensional measurement we will work directly in the 784-dimensional space (28 $\times$ 28, the size of the images) using the pixel values as individual dimensions.

**Tasks**

- Load the data from
`mnist_trn.npz`

. Beware that the structure of the data is slightly different from the previous task.

- Repeat the steps from the Task 1 and produce similar outputs:
,`E_progress_MNIST.png`

.`classif_MNIST.png`

Implement the logistic regression with dimension lifting as shown in the following figure from [1]. The figure assumes $w$ being the class label and $\phi$ the sought vector. Generate similar outputs to verify a successful implementation.

[1] Simon J. D. Prince, Computer Vision: Models, Learning and Inference, 2012

courses/be5b33rpz/labs/06_logreg/start.txt ยท Last modified: 2020/10/29 13:22 by serycjon