Search
In this labs we will build a logistic regression model for a letter (digit) classification task. It has been shown at the lecture that for many practical problems, the log odds $a(x)$ can be written as a linear function of the input $x$ $$ a(x) = \ln\frac{p_{K|x}(1|x)}{p_{K|x}(2|x)} = w\cdot x\;, $$ where the original data vector $x$ has been augmented by adding 1 as its first component (corresponding to the bias in the linear function). It follows that the aposteriori probabilities have then the form of a sigmoid function acting on a linear function of $x$ $$ p_{K|x}(k|x) = \frac{1}{1+e^{-kw\cdot x}} = \sigma(kw\cdot x)\qquad k\in\{-1, +1\}. $$ The task of estimating the aposteriori probabilities then becomes a task of estimating their parameter $w$. This could be done using the maximum likelihood principle. Given a training set $\mathcal{T} = \{(x_1, k_1), \ldots, (x_N, k_N)\}$, ML estimation leads to the following problem: $$ w^* = \arg\min_w E(w), \qquad \mbox{where}\\ E(w) = \frac{1}{N}\sum_{(x,k) \in \mathcal{T}}\ln(1+e^{-kw\cdot x})\;, $$ The normalisation by the number of training samples helps to make the energy, and thus the gradient step size, independent of the training set size.
We also know that there is no close-form solution to this problem, so a numerical optimisation method has to be applied. Fortunately, the energy is a convex function, so any convex optimisation algorithm could be used. We will use the gradient descent method in this labs for its simplicity, however, note that many faster converging methods are available (e.g. the heavy-ball method or the second order methods).
General information for Python development.
To fulfil this assignment, you need to submit these files (all packed in a single .zip file) into the upload system:
.zip
logreg.ipynb
logreg.py
logistic_loss
logistic_loss_gradient
logistic_loss_gradient_descent
get_threshold
classify_images
E_progress_AC.png
w_progress_2d_AC.png
aposteriori.png
classif_AC.png
E_progress_MNIST
classif_MNIST.png
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.
The first task is to classify images of letters 'A' and 'C' using the following measurement:
x = normalise( (sum of pixel values in the left half of image) -(sum of pixel values in the right half of image))
compute_measurements(imgs, norm_parameters)
Tasks
data_logreg.npz
trn
tst
loss = logistic_loss(X, y, w)
loss = logistic_loss(np.array([[1, 1, 1],[1, 2, 3]]), np.array([1, -1, -1]), np.array([1.5, -0.5])) print(loss) # -> 0.6601619507527583
g = logistic_loss_gradient(X, y, w)
gradient = logistic_loss_gradient(np.array([[1, 1, 1],[1, 2, 3]]), np.array([1, -1, -1]), np.array([1.5, -0.5])) print(gradient) # -> [0.28450597 0.82532575]
w, wt, Et = logistic_loss_gradient_descent(X, y, w_init, epsilon)
w_init = np.array([1.75, 3.4]) epsilon = 1e-2 [w, wt, Et] = logistic_loss_gradient_descent(train_X, trn['labels'], w_init, epsilon) print('w: ', w, '\nwt: ', wt, '\nEt: ', Et) # -> w: [1.8343074 3.3190428] # -> wt: [[1.75 1.75997171 1.77782062 1.80593719 1.83779008 1.84398609 1.8343074 ] # -> [3.4 3.39233974 3.37848624 3.35614129 3.32899079 3.31724604 3.3190428 ]] # -> Et: [0.25867973 0.25852963 0.25830052 0.25804515 0.25791911 0.25791612 0.25791452]
plot_gradient_descent(X, y, loss_function, w, wt, Et)
w_progress_2d_ac.png
thr = get_threshold(w)
thr = get_threshold(np.array([1.5, -0.7])) print(thr) # -> 2.142857142857143
y = classify_images(X, w)
y = classify_images(np.array([[1, 1, 1, 1, 1, 1, 1],[0.5, 1, 1.5, 2, 2.5, 3, 3.5]]), np.array([1.5, -0.5])) print(y) # -> [ 1 1 1 1 1 -1 -1]
classif_ac.png
Next we apply the method to a similar task of digit classification. However, instead of using a one-dimensional measurement we will work directly in the 784-dimensional space (28 $\times$ 28, the size of the images) using the pixel values as individual dimensions.
mnist_trn.npz
E_progress_MNIST.png
Implement the logistic regression with dimension lifting as shown in the following figure from [1]. The figure assumes $w$ being the class label and $\phi$ the sought vector. Generate similar outputs to verify a successful implementation.
[1] Simon J. D. Prince, Computer Vision: Models, Learning and Inference, 2012