Search
Figure 1. Dimensionality reduction by PCA1).
So far, in this course, we have been using measurements of various dimensionality, sometimes the measurements had even dimensionality same as the number of pixels in the image. But are all those dimensions actually necessary? Isn't there a more compact representation of the data which we could use instead without sacrificing the expressiveness of the data? Have a look at Figure 1 for an example. All the data, though 3-dimensional lay actually in a 2D plane. If we were able to compute the orthogonal basis of this plane (red arrows in Figure 1, right), we could represent the data using just 2D measurements. Principal component analysis (PCA) is a method for such features space dimensionality reduction (also called feature extraction).
In today's labs, the PCA method will be used to extract a new set of “more compact” measurements from the set of images of faces. We are given a set of images of faces of size 300×250, but in the following we will represent them rather as a column vector of intensities of length $n$ = 75000 (=300*250) - a vector of image columns stacked one after another. Thus, each image can be seen as a point in $n$-dimensional space. The goal of PCA is then to represent the images in $m$-dimensional space, where $m \ll n$.
We will then use this compact representation to classify the faces of several individuals using the nearest neighbour classifier.
The input to PCA is a set $X=\{\mathbf{x}_i|\mathbf{x}_i \in \mathbb{R}^n, i=1,\ldots,k\}$ of $k$ training vectors, in our case $k$ vectorised images of faces. To be able to apply the PCA, the data should be centred in origin, i.e. having zero mean, $\bar{\mathbf{x}} = \frac{1}{k}\sum_{i=1}^{k} \mathbf{x}_i = \mathbf{0}$. This is usually not the case for real data, so the first step is always to centre the data by subtracting their mean: $\tilde{\mathbf{x}}_i = \mathbf{x}_i - \bar{\mathbf{x}}$. To simplify notation, we will still refer to $\mathbf{x}_i$ instead of $\tilde{\mathbf{x}}_i$ in the following and assuming that the data is already centered. However, be aware, that every time you are using the data this conversion is implicitly present and you have to take it into account in the code!
PCA searches for a set $Y = \{\mathbf{y}_j|\mathbf{y}_j \in \mathbb{R}^n, j=1,\ldots,m \}$ of $m$ vectors ($m<n$), and a set $W = \{\mathbf{w}_i|\mathbf{w}_i \in \mathbb{R}^m, i=1,\ldots,k\}$ such, that the approximated vectors (images) $$\mathbf{z}_i = \mathbf{Y}\mathbf{w}_i, \quad \mathbf{Y}= \begin{bmatrix} \mathbf{y}_1 & \cdots & \mathbf{y}_m \end{bmatrix}$$ are good approximation of the original vectors $\mathbf{x}_i$ in the least squares sense, i.e. their residuum $$\sum_{i=1}^{k} \| \mathbf{x}_i - \mathbf{z}_i \|^2$$ is minimised. Thus, each image $\mathbf{x}_i$ is approximated by a linear combination of small number of vectors $\mathbf{y}_i$. These vectors are called principal components and correspond to the red arrows in Figure 1. Note that their dimension is the same as the original vectors. In the case of images of faces they are also usually called eigenfaces. As $m < n$, each input vector $\mathbf{x}_i$ is actually represented by a smaller number of coefficients $\mathbf{w}_i$ – the dimensionality reduction was achieved!
The vectors $\mathbf{y}_j \in Y$ are found as first $m$ eigenvectors corresponding to $m$ largest eigenvalues of the matrix $\mathbf{S} = \mathbf{X}\mathbf{X}^T$, where $\mathbf{X} = \begin{bmatrix} \mathbf{x}_1 & \cdots & \mathbf{x}_k \end{bmatrix}$ is an $n \times k$ matrix containing all input images.
The coefficients $\mathbf{w}_i$ of the linear combination are then found as dot products $$\mathbf{w}_i = \mathbf{Y}^T\mathbf{x}_i\,.$$ This geometrically corresponds to a projection of $\mathbf{x}_i$ onto the orthogonal base given by the vectors $\mathbf{y}_j$.
As is often the case when working with images, the matrix $\mathbf{S}$ soon becomes too large to fit in the memory. In our case, $\mathbf{S}$ would need about 45GB. Therefore, we employ the following trick [1], [3].
Recall, that for the eigenvectors and eigenvalues of $\mathbf{S}$ we have $$\mathbf{Su} = \lambda \mathbf{u}$$
Instead of $\mathbf{S}$, let us construct a smaller $k \times k$ matrix $\mathbf{T} = \mathbf{X}^T\mathbf{X}$. The relation between eigenvectors of $\mathbf{S}$ and $\mathbf{T}$ is then
$$ \mathbf{Tv} = (\mathbf{X}^T\mathbf{X})\mathbf{v} = \lambda\mathbf{v} $$ $$ \mathbf{X}(\mathbf{X}^T\mathbf{X})\mathbf{v} = \mathbf{X}\lambda\mathbf{v} = \lambda(\mathbf{Xv}) $$ $$ \mathbf{S}(\mathbf{Xv}) = \lambda(\mathbf{Xv}) $$
Where $\mathbf{v}$ and $\lambda$ are the eigenvectors and eigenvalues of $\mathbf{T}$. Therefore, if we look for the first $m \leq k$ eigenvectors of $\mathbf{S}$ only, it is sufficient to compute the eigenvectors of much smaller matrix $\mathbf{T}$ and multiplying them by matrix $\mathbf{X}$ $$\mathbf{y}_j = \mathbf{X} \mathbf{v}_j\,.$$
To fulfil this assignment, you need to submit these files (all packed in a single .zip file) into the upload system:
.zip
answers.txt
assignment_12.m
pca_basis.m
compact_representation.m
reconstruct_images.m
classify_nn_pca.m
eigenfaces.png
cumulative_sum_graph.png
partial_reconstructions.png
error_vs_reduction_graph.png
Start by downloading the template of the assignment.
[Y, lambdas] = pca_basis(X)
trn_data
X
Y
lambdas
MATLAB
eig
eig()
pca_basis
W = compact_representation(X, Y, m)
m
W
Z = reconstruct_images(W, Y, x_mean)
x_mean
Z
compact_representation
reconstruct_images
tst_data
[labels_tst] = classify_nn_pca(images, Y_trn, X_trn_mean, W_trn, labels_trn, m)
images
tst_data.images
labels
labels_tst_gt
What to do next? With the knowledge gained this semester, you can
Any of the above problems is considered a bonus task. The formulations are intentionally vague and the precise task formulation and implementation is part of the task.
[1] Eigenfaces for recognition talk [2] Face alignment for increased PCA precision [3] Turk, M. and Pentland, A. (1991). Eigenfaces for recognition. The Journal of Cognitive Neuroscience, 3(1): 71-86.