Program:
Downloads:
Likely, you will not manage to finish the implementation of all the functions during the excercise in the lab. Finish them as a home work.
Be sure to understand the following concepts:
Linear regression is used to model continuous dependent variables. The model has the form of a linear function.
Assume we have 1D data. Each object is described only with 1 feature (<latex>$x_i \in X$</latex>) and with the value of dependent variable (<latex>y_i \in Y</latex>). The goal is to find the function (or more precisely the parameters <latex>$w_1$</latex> and <latex>$w_0$</latex> of the function)
<latex> $$ f(x) = w_1x + w_0 $$ </latex>
so that the error
<latex> $$ MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - f(x_i))^2 = \frac{1}{N} \sum_{i=1}^{N} (y_i - (w_1x_i + w_0))^2 $$ </latex>
which the model makes on the training data is minimal.
In case of linear regression and MSE minimization, there are explicit formulas to compute the weights w from training data:
<latex> \[w_1 = \frac{\sum_{i=1}^N (x_i - \bar X)(y_i - \bar Y)}{\sum_{i=1}^N (x_i - \bar X)^2} = \frac{s_{XY}}{s_X^2}\ \ \ \text{a}\ \ \ w_0 &=& \bar Y - w_1\bar X, \] </latex>
where <latex>$s_{XY}$</latex> is the covariance of X and Y and <latex>$s_{X}^2$</latex> is the variance of X.
If we introduce the so-called homogeneous coordinates (the description of objects is enriched with one feature that always have value 1), the description of input variables will look like this:
<latex> \[ X = \left( \begin{array}{cccc} x_1 & x_2 & \ldots & x_N \\ 1 & 1 & \ldots & 1 \end{array} \right). \] </latex>
Arranging all the coefficients of the linear function into a vector <latex>\mathbf{w}=(w_1, w_0)^T</latex>, we can apply the linear function on all objects at once as
<latex> \[ Y = \mathbf{w}^T X. \] </latex>
The weight vector can then be computed as
<latex> \[ \mathbf{w} = (XX^T)^{-1}XY^T \] </latex>
In D-dimensional space we can use the same formula. The weight vector will contain D+1 coefficients, the description of 1 object will be constituted by D+1 variables (one of them always equal to 1).
Learn how to compute the weight vector in MATLAB. Do not use the above mentioned formula, see
help slash
and
help regress
.
Using linear regression, we would like to fit also polynomial model to the data. This can be accomplished by basis expansion, i.e. by first mapping the data into a higher-dimensional space.
Create function with the following prototype:
function xp = mapForPolynom(x, k)
Inputs | x | [1 x N] | 1D description of objects. Values of the input for all objects. |
---|---|---|---|
k | scalar | The highest power of x to be present in the output |
|
Outputs | xp | [k+1 x N] | Matrix of powers of x. |
Output xp
: the first row will be equal to x
k, the second row will be equal to x
k-1, …, and the one to last row will be equal to x
and the last row is full of ones.
We would like to use polynomial models of various degrees to model the data.
Create function with the following prototype:
function model = trainRegrPolynom(x, y, k)
Inputs | x | [1 x N] | 1D description of objects. Values of the independent variable for all training data. |
---|---|---|---|
y | [1 x N] | The vector of values of the dependent variable for all training data. | |
k | scalar | The degree of polynom we would like to use as the model. | |
Outputs | model | [(k+1) x 1] | Vector of coefficients of the polynomial model. |
After training a polynomial model, we would also like to apply the model on new data.
Create function with the following prototype:
function yp = predRegrPolynom(model, x)
Inputs | model | [(k+1) x 1] | Vector of coefficients of the linear model. |
---|---|---|---|
x | [1 x N] | 1D description of objects. Values of the input for all test objects. | |
Outputs | yp | [1 x N] | The vector of predictions of the dependent variable for all test objects. |
Create a script that will
After this script will work perfectly, enrich it with the ability to
K-nearest neighbors (kNN) method is a simple and universal modeling method. It creates its prediction in the following way:
In case of polynomial modeling, the model was a vector of weights. What is the model in kNN method?
We have to be able to determine the k nearest neighbors of points from one set in another set.
Create function with the following prototype:
function iknn = findkNN(xtr, xtst, k)
Inputs | xtr | [D x Ntr] | Matrix with the training vectors. Prototype vectors. We search the neighbors among them. |
---|---|---|---|
xtst | [D x Ntst] | Matrix with the testing vectors. Query vectors. We search the neighbors for them. | |
k | scalar | The number of neighbors. | |
Outputs | iknn | [k x Ntst] | Matrix of indices of the nearest neighbors. |
Output iknn
: for each testing vector, it will contain indices into the matrix of training vectors.
iknn(:, 3)
will be a column vector of indices of k nearest neighbors of the 3rd testing vector. xtr(:, iknn(1,3) )
should be the nearest neighbor of the 3rd testing vector found in the training points.A small help:
distmat()
function we created last week?
min
return when called in the following way: [foo, imin] = min(dm, [], 1)
?
min
can be used to return the second, third, etc. lowest element?
Create function with the following prototype:
function model = trainRegrKNN(x, y, k)
Inputs | x | [D x N] | Values of the inputs for all objects. In this case, D = 1. |
---|---|---|---|
y | [1 x N] | The vector of values of the dependent variable for all objects. | |
k | scalar | The number of nearest neighbors we would like to use as the model. | |
Outputs | model | struct | Training data x ,y and the number of neighbors k . |
After training the kNN model, we would also like to apply the model on new data.
Create function with the following prototype:
function yp = predRegrKNN(model, x)
Inputs | model | struct | Model for the kNN method. The output of the trainRegrKNN function. |
---|---|---|---|
x | [D x N] | Values of the inputs for all test objects. In this case, D = 1. | |
Outputs | yp | [1 x N] | The vector of predictions of the dependent variable for all test objects. |
Create a script that will
After this script will work perfectly, enrich it with the ability to