Search
In all the previous labs, we were trying to design a classifier $q(\mathbf{x}) : \mathcal{X} \rightarrow D$ by using the a priori probabilities $p_K(k)$ and the probability density functions $p_{X|k}(x|k)$. We were minimising the Bayesian risk or some more complex criterion. This kind of approach is commonly referred to as generative, because we try to estimate the full probabilistic model of all variables (we try to find the probabilistic model which “generated” the training samples).
Sometimes, this is however not practical or even possible, e.g. it is not possible to get enough samples to get a good estimate of $p_{X|k}(\mathbf{x}|k)$, so instead we try to approximate the decision function $q(\mathbf{x})$ directly without any knowledge of the underlying probability distribution. This approach is referred to as discriminative, because we are only interested in training the classifier to make correct decisions and we do not care about the actual probabilistic model.
In the discriminative approach, we define a criterion (emperical risk) $R_{\mbox{emp}}(q)$ which estimates the risk of the classifier using the training set. Typically, we calculate $R_{\mbox{emp}}(q)$ as the classification error of the classifier $q$ on the training set $\mathcal{T}$) and we try to find a classifier $q^*$ which minimizes the emperical risk $$\mathcal{T} = \{(\mathbf{x}_i, y_i); \mathbf{x}_i\in R^n, y_i\in\{1,2\}\}_{i=1,\ldots,N} $$ $$R_{\mbox{emp}}(q) = \frac{1}{N}\sum_{i=1}^N W(q(\mathbf{x}_i), y_i) $$ $$q^* = \arg\min_q R_{\mbox{emp}}(q) $$ where $W$ denotes the loss function (typically the zero-one loss function is used). Note that in the discriminative model task formulation no probability densities are used. In this lab we will also strictly use bold letters for vectors and regular font for scalars.
To fulfil this assignment, you need to submit these files (all packed in a single .zip file) into the upload system:
.zip
answers.txt
assignment_06.m
perceptron.m
lift_dimension.m
classif_quadrat_perc.m
perceptron_linear.png
perceptron_quadratic.png
Start by downloading the template of the assignment.
Today's task is to implement the perceptron algorithm for learning a linear classifier. In the case of a linear classifier for two classes ($|K|=2$), the discriminative function $q(\mathbf{x}): \mathcal{X} \rightarrow \{1, 2\} $ can be expressed as
$$ q(\mathbf{x}) = \left\{ \begin{array}{l l} 1 & \quad \mathbf{x} \cdot \mathbf{w} + b > 0 \\ 2 & \quad \mathbf{x} \cdot \mathbf{w} + b < 0 \end{array} \right. $$
where $\mathbf{x} = (x_1, \ldots, x_n)^T$, $\mathbf{w} = (w_1, \ldots, w_n)^T$, and $\mathbf{x} \cdot \mathbf{w} = \sum_{i=1}^n x_i w_i$ denotes the dot product and the dimensionality of $\mathcal{X}$ is $n$. In geometrical interpretation, the expression $\mathbf{x} \cdot \mathbf{w} + b = 0$ is an equation of a line (or a hyperplane in higher dimensions) which divides the space into two subspaces and the value of the discriminative function $q(\mathbf{x})$ represents in which of the two subspaces the vector $\mathbf{x}$ is located.
Thus, the task of finding a linear classifier for two classes can be formulated as finding parameters $\mathbf{w}$ and $b$ so that the following system of linear inequalities holds for each item $(\mathbf{x}_i, y_i)$ in the training set: $$\mathbf{w} \cdot \mathbf{x}_i + b > 0 \quad \mbox{if}\;y_i = 1$$ $$\mathbf{w} \cdot \mathbf{x}_i + b < 0 \quad \mbox{if}\;y_i = 2$$
The solution of the system exists only if the training data are linearly separable, in which case the solution can be found using the perceptron algorithm.
The system of inequalities can be equally rewritten as $$\mathbf{v} \cdot \mathbf{z}_i > 0$$ if we substitute $$\mathbf{v} = [\mathbf{w}; b]$$ $$\mathbf{z}_i = [\mathbf{x}_i; 1] \quad \mbox{if}\;y_i = 1$$ $$\mathbf{z}_i = -[\mathbf{x}_i; 1] \quad \mbox{if}\;y_i = 2$$ (try it yourself on a piece of paper - all that happened here is that we made a 'clever' substitution so that we still have a system of $N$ inequalities, but now all inequalities have the same format).
Now we can easily describe the algorithm:
The Perceptron Algorithm
Tasks
[w b] = perceptron(X, y, maxIterations)
X
y
maxIterations
[NaN, NaN]
X = [1 1 2 2; 1 3 5 6]; y = [1 1 2 2]; [w,b] = perceptron(X, y, 100) w = [2; -3] b = 9 X = [3 1 2 2; 2 3 5 6]; y = [1 1 2 1]; [w,b] = perceptron(X,y, 100) w = NaN b = NaN
createdata
perceptron
pline
ppatterns
As we have seen in the last point of the first task, for some data a linear discriminative function that would separate the data does not exist.
In order to deal with this problem, we will extend the perceptron algorithm so that it learns quadratic discriminant function (see [2]). We define a mapping function $\phi: \mathcal{X} \rightarrow \mathcal{Z}$ which projects the training samples $\mathbf{x} \in \mathcal{X}=\mathbb{R}^2$ into a higher-dimensional space $\mathcal{Z}$.
In our task, we will consider the discriminative function in the form $$q'(\mathbf{x}) = w_1x_1 + w_2x_2 + w_3x_1^2 + w_4x_1x_2 + w_5x_2^2 + b$$
By comparing this discriminative function $q'(\mathbf{x})$ to the linear case $q(\mathbf{x})$, we can derive the formula for the feature vector $\mathbf{z} \in \mathcal{Z}$ in the lifted space:
$$\mathbf{z} = \phi(\mathbf{x})= (x_1, x_2, x_1^2, x_1x_2, x_2^2)^T$$
The entire extension of the perceptron algorithm then goes as follows:
Z = lift_dimension(X)
X = [1 0 2; 3 5 1]; Z = lift_dimension(X); Z = 1 0 2 3 5 1 1 0 4 3 0 2 9 25 1
Y = classif_quadrat_perc(tst, model)
tst
Y
model
pboundary
w
b
Fill the correct answers to your answers.txt file.
Choose one or both of the following bonus tasks.
Implement the Kozinec's algorithm for learning of a linear discriminant function. For a set of points $\{x^j, j \in J\}$, the Kozinec's algorithm finds vector $\alpha$, which satisfies a set of linear inequations $$ \langle\alpha, x^j\rangle = 0, \quad j \in J$$ I.e. the problem is the same as in the Perceptron. The Kozinec's algorithm creates a sequence of vectors $\alpha_1, \alpha_2, \ldots, \alpha_t, \alpha_{t+1}, \ldots$ in the following way.
The algorithm: Vector $\alpha_1$ is any vector from the convex hull of the set $\{x^j, j \in J\}$, e.g. one of the vectors $x^j$, $j \in J$. If the vector $\alpha_t$ has already been computed, $\alpha_{t+1}$ is obtained using the following rules:
Output: Identical to the output to the task 1 in the non-bonus part. Hint: The Kozinec algorithm is described in Chapter 5 of [3]
Implement multi-class perceptron learning algorithm. The mutli-class classifier is formulated as $$ f(\mathbf{x}) = \arg\max_{y\in\cal Y}\,\bigl(\mathbf{w}_y^\top\mathbf{x}+b_y\bigr), $$ where $\cal W=\bigl\{(\mathbf{w}_y, b_y)\mid y\in\cal Y\bigr\}$ is a set of classifier parameters and $\cal Y$ is a set of classes.
The training algorithm was not presented at the lectures, but it is simple and elegant as follows:
Repeat steps 1 and 2 for a data with 3 or more classes.
Plot the multi-class decision boundaries using the STPR toolbox pboundary function.
[1] Chapter 5.5 of Duda, Hart, Stork: Pattern classification (algorithm 4, page 230) [2] Nonlinear perceptron (short support text for labs) [3] Michail I. Schlesinger, Vaclav Hlavac. Ten Lectures on Statistical and Structural Pattern Recognition. Kluwer 2002 [4] Jiri Matas: Learning and Linear Classifiers slides [5] Franc V. Perceptron Algorithm. (in Czech) [6] R. Sara Multi-class Perceptron