Search
One of the most popular classifiers is SVM (support vector machine). Let us have $n$ observations $\mathbf x_1,\dots,\mathbf x_n\in\mathbb R^d$. For simplicity, let's assume that this is a binary classification with labels $y_1,\dots,y_n\in\{-1,1\}$, and that the bias is already embedded in the data. SVM searches for a linear separating hyperplane $\mathbf w^\top \mathbf x$ by introducing an auxiliary variable $\xi_i\ge 0$ for each observation $i$ that measures the misclassification rate. It achieves this using a condition $$ \xi_i \ge 1 - y_i\mathbf w^\top \mathbf x_i. $$ Since we will want to minimize $\xi_i$, we will reach its minimum value for $y_i\mathbf w^\top \mathbf x_i\ge 1$. Thus, for the label $y_i=1$ we want the condition $\mathbf w^\top \mathbf x_i\ge 1$ and similarly for the negative label. This is a stronger condition than the classical $\mathbf w^\top \mathbf x_i\ge 0$. In other words, SVM tries to achieve a certain distance of well-classified observations from the separating hyperplane, which adds robustness.
The entire optimization problem can then be written as \begin{align*} \operatorname*{minimize}_{w,\xi}\qquad &C\sum_{i=1}^n\xi_i + \frac12||\mathbf w||^2 \\ \text{under conditions}\qquad &\xi_i \ge 1 - y_i\mathbf w^\top \mathbf x_i, \\ &\xi_i\ge 0. \end{align*} The regularization $\frac12||\mathbf w||^2$ and the regularization constant $C>0$ appeared in the objective function. Although this problem can be solved in this formulation, it is converted to a dual formulation by default \begin{align*} \operatorname*{maximize}_{z}\qquad &-\frac12\mathbf z^\top Q\mathbf z + \sum_{i=1}^nz_i \\ \text{under conditions}\qquad &0\le z_i\le C, \end{align*} where the matrix $Q$ has elements $q_{ij}=y_iy_j\mathbf x_i^\top \mathbf x_j$. There is a connection between the primal and dual problems. First, their optimal values coincide. Second, when we solve the dual problem, the solution of the primal problem is obtained by $\mathbf w=\sum_{i=1}^ny_iz_i\mathbf x_i$.
This problem can be solved using the method coordinate descent. This method updates only one component of the vector $\mathbf z$ in each iteration. Therefore, in each iteration, we fix some $i$ and we replace the optimization over $\mathbf z$ by the optimization over $\mathbf z+d\mathbf e_i$, where $\mathbf e_i$ is the zero vector with one on component $i$. In each iteration we then solve \begin{align*} \operatorname*{maximize}_d\qquad &-\frac12(\mathbf z+d\mathbf e_i)^\top Q(\mathbf z+d\mathbf e_i) + \sum_{i=1}^nz_i + d \\ \text{under conditions}\qquad &0\le z_i + d\le C. \end{align*} This optimization problem is simple since $d\in\mathbb R$ and there is a closed-form solution. This procedure is repeated for a given number of epochs, where one epoch performs the above-mentioned iterations for $i=1$, $i=2$ up to $i=n$.
Implement the functions Q = computeQ(X, y), w = computeW(X, y, z), z = solve_SVM_dual(Q, C; max_epoch) and w = solve_SVM (X, y, C; kwargs…). This diagram shows both input and output arguments. In more detail, these arguments are:
Q = computeQ(X, y)
w = computeW(X, y, z)
z = solve_SVM_dual(Q, C; max_epoch)
w = solve_SVM (X, y, C; kwargs…)
X
y
C
w
z
Q
The computeQ function computes the Q matrix and the computeW function computes the w from the dual solution. The solve_SVM_dual function takes as inputs the parameters of the dual problem and does max_epoch epochs (therefore n*max_epoch iterations) of the coordinate descent method. The solution should be initialized as $z=0$. The solve_SVM function combines the previous functions into one.
computeQ
computeW
solve_SVM_dual
max_epoch
n*max_epoch
solve_SVM
Finally, write the function w = iris(C; kwargs…) which will load the iris dataset, as positive class it will consider versicolor, as negative class virginica, as features it will use PetalLength and PetalWidth, normalizes these inputs, adds a bias (as the last column of $X$), and finally uses the functions written above to calculate the optimal separating hyperplane w.
w = iris(C; kwargs…)
versicolor
virginica
PetalLength
PetalWidth
Writing the following tests is not necessary. However, it can help you to debug your code. For example, you can:
kwargs