Search
Program:
Downloads:
Likely, you will not manage to finish the implementation of all the functions during the excercise in the lab. Finish them as a home work.
Be sure to understand the following concepts:
Linear regression is used to model continuous dependent variables. The model has the form of a linear function.
Assume we have 1D data. Each object is described only with 1 feature (<latex>$x_i \in X$</latex>) and with the value of dependent variable (<latex>y_i \in Y</latex>). The goal is to find the function (or more precisely the parameters <latex>$w_1$</latex> and <latex>$w_0$</latex> of the function)
<latex> $$ f(x) = w_1x + w_0 $$ </latex>
so that the error
<latex> $$ MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - f(x_i))^2 = \frac{1}{N} \sum_{i=1}^{N} (y_i - (w_1x_i + w_0))^2 $$ </latex>
which the model makes on the training data is minimal.
In case of linear regression and MSE minimization, there are explicit formulas to compute the weights w from training data:
<latex> \[w_1 = \frac{\sum_{i=1}^N (x_i - \bar X)(y_i - \bar Y)}{\sum_{i=1}^N (x_i - \bar X)^2} = \frac{s_{XY}}{s_X^2}\ \ \ \text{a}\ \ \ w_0 &=& \bar Y - w_1\bar X, \] </latex>
where <latex>$s_{XY}$</latex> is the covariance of X and Y and <latex>$s_{X}^2$</latex> is the variance of X.
If we introduce the so-called homogeneous coordinates (the description of objects is enriched with one feature that always have value 1), the description of input variables will look like this:
<latex> \[ X = \left( \begin{array}{cccc} x_1 & x_2 & \ldots & x_N \\ 1 & 1 & \ldots & 1 \end{array} \right). \] </latex>
Arranging all the coefficients of the linear function into a vector <latex>\mathbf{w}=(w_1, w_0)^T</latex>, we can apply the linear function on all objects at once as
<latex> \[ Y = \mathbf{w}^T X. \] </latex>
The weight vector can then be computed as
<latex> \[ \mathbf{w} = (XX^T)^{-1}XY^T \] </latex>
In D-dimensional space we can use the same formula. The weight vector will contain D+1 coefficients, the description of 1 object will be constituted by D+1 variables (one of them always equal to 1).
Learn how to compute the weight vector in MATLAB. Do not use the above mentioned formula, see
help slash
help regress
Using linear regression, we would like to fit also polynomial model to the data. This can be accomplished by basis expansion, i.e. by first mapping the data into a higher-dimensional space.
Create function with the following prototype:
function xp = mapForPolynom(x, k)
x
k
xp
Output xp: the first row will be equal to xk, the second row will be equal to xk-1, …, and the one to last row will be equal to x and the last row is full of ones.
We would like to use polynomial models of various degrees to model the data.
function model = trainRegrPolynom(x, y, k)
y
model
After training a polynomial model, we would also like to apply the model on new data.
function yp = predRegrPolynom(model, x)
yp
Create a script that will
After this script will work perfectly, enrich it with the ability to
K-nearest neighbors (kNN) method is a simple and universal modeling method. It creates its prediction in the following way:
In case of polynomial modeling, the model was a vector of weights. What is the model in kNN method?
We have to be able to determine the k nearest neighbors of points from one set in another set.
function iknn = findkNN(xtr, xtst, k)
xtr
xtst
iknn
Output iknn: for each testing vector, it will contain indices into the matrix of training vectors.
iknn(:, 3)
xtr(:, iknn(1,3) )
A small help:
distmat()
min
[foo, imin] = min(dm, [], 1)
function model = trainRegrKNN(x, y, k)
After training the kNN model, we would also like to apply the model on new data.
function yp = predRegrKNN(model, x)
trainRegrKNN