Warning

This page is located in archive.

At the very beginning of the recognition labs, we assumed the conditioned measurement probabilities $p_{X|k}(x|k)$ and the a priori probabilities $P_K(k)$ to be known and we used them to find the optimal Bayesian strategy. Later, we looked at problems where the a priori probability is not known or does not exist and we constructed the optimal minimax strategy. Today, we face the problem of unknown probability density function $p_{X|k}(x|k)$ and we will be interested in its estimation.

Lets assume in this lab that **we know** the density function is from a class of functions parametrised by **unknown parameters**. In our case, the known model is going to be Normal distribution and all we do not know are its parameters - mean and standard deviation. To estimate the parameters we will use so called **training examples** (training set) - a set of examples drawn randomly from some existing but unknown distribution.

The assignment has two parts. In the first one we will explore the importance of the **sufficient training set size**. We will see what happens if we do not have enough training examples for the estimate. In the second part, we will continue with the letter classification task and we will train a Bayes classifier using training sets of different sizes and we will focus on **maximising the likelihood function**.

To fulfil this assignment, you need to submit these files (all packed in a single `.zip`

file) into the upload system:

- answers to the Assignment Questions`answers.txt`

- a script for data initialization, calling of the implemented functions and plotting of their results (for your convenience, will not be checked)`assignment_04.m`

- function for estimating the parameters of a normal distribution from a training set`mle_normal.m`

- function for computing the estimate variance over 100 random training sets`mle_variance.m`

- functions for computing the prior probability of a given class`estimate_prior.m`

- functions which computes the log-likelihood for a given class with arbitrary sigma values`loglikelihood_sigma.m`

- the graf of estimate variance as a function of the training set size`mle_variances.png`

,`loglikelihood20.png`

,`loglikelihood200.png`

- images of log likelihood for different training sizes`loglikelihood2000.png`

,`mle_estimatesA.png`

- images of estimated distributions for class A and C`mle_estimatesC.png`

,`mle_classif20.png`

,`mle_classif200.png`

- classification results for our three estimates`mle_classif2000.png`

Start by downloading the **template of the assignment**.

First we use a synthetic experiment to demonstrate the effect of the training set size on the parameter estimate. We will generate random training sets of different sizes from the normal distribution (using `randn`

function) and for each set we do the maximum likelihood estimate of mean and standard deviation parameters. For each training set size we will repeat the estimate 100x and we will measure the “stability” of the estimate by variance over these 100 estimates.

Do the following:

- Complete the template function
so that for given training set it returns the maximum likelihood estimates of mean and standard deviation of a normal distribution.`[mu sigma] = mle_normal(x)`

**Hint:**The closed form solution for the maximum likelihood estimator with normal distributions is explained in [2]. Do not use MATLAB functions`mean`

,`std`

and`var`

in this assignment at all.

x = [0 1 2 3 4 5 6 7 8 9 10]; [mu sigma] = mle_normal(x) mu = 5 sigma = 3.1623

- Complete the template function
which for given training set size generates randomly (using the`[var_mean var_sigma] = mle_variance(cardinality)`

`randn`

function) 100 training sets of that size, callson each of them and returns the variance of the estimated mean and standard deviation.`mle_normal`

**Hint 1:**Generate the data from N(0,1) as done by default by`randn`

function.

**Hint 2:**Do not specify the random seed in any way, so that we can test the code.

**Hint 3:**Do not use the MATLAB`var`

or`std`

functions, use the expression from point 1.

- Plot the estimate variances for various training set sizes into one graph and save it as

.**mle_variances.png**

We will keep on solving the letters classification task. However, in contrast to the previous labs, both, the a priori probabilities $p_{K}(k)$ and the conditional probability density functions $p_{Xk}(x|k)$ are unknown. We have a set of training images from which the probabilities can be estimated and test examples (test set) to verify our estimate.

To estimate $p_{X|k}(x|k)$ we assume that a normal distribution fits the data, $p_{X|k}(x|k) \sim N(\mu_k, \sigma_k)$, and we compute its parameters $\mu_k$, $\sigma_k$ with the maximum likelihood criteria. Here, the $k$ is the class label - the letter we consider. The a priori probability of each class can be simply estimated by taking the fraction of the training images belonging to this class.

Having estimated the a priori probabilities and the conditional probability density functions we can use the Bayesian classifier to solve the problem.

The performance of the classifier will be measured with the classification error using a different set of pre-labelled images called **test set**, which was not used in the training stage. We will work with three different training sets of images (20, 200 and 2000 images) so we can experiment with the performance of the classifier, i.e. observing the classification error, when the set size changes.

As before, the following simple measurement will be used during the task:

x = (sum of pixel values in the left half of image) -(sum of pixel values in the right half of image)as implemented in

`x = compute_measurement_lr_discrete(imgs)`

Do the following:

- Load the data file
`data_33rpz_cv04.mat`

into Matlab. Three image sets for training are available,

,**trn_20**

and**trn_200**

with 20, 200 and 2000 pre-labelled images.**trn_2000**

- For all the training sets do the following:
- Estimate the prior probabilities $p_{K}(A)$ and $p_{K}(C)$. For this, complete the template

.**prior = estimate_prior(idLabel, labelling)**

labelling = [1 1 1 0 0 0 0 0 0]; prior = estimate_prior(1, labelling) prior = 0.3333

- Compute the parameters $\mu_k$ and $\sigma_k$ of the conditional distributions $p_{Xk}(x|A)$ and $p_{Xk}(x|C)$, with the maximum likelihood estimator using the
function implemented in the first part.`[mu sigma] = mle_normal(x)`

- Complete the template
which for fixed $\mu_k$ computed in 2.II computes the log-likelihood function`[L maximizerSigma maxL] = loglikelihood_sigma(x, D, sigmas)`

`L`

as a function of $\sigma_k$. The function should also return the $\sigma^*_k$ corresponding to the maximum of the log-likelihood function.

**Hint:**$L$ is defined in equation (2) in [2] (or [1], eq 5, page 87).

**Hint:**Use the same function as in the last assignment to find a maximum of a function:x_maximizer = fminbnd(@(x) -some_function(other_param1, other_param2, x), 0, 1)

**Hint:**Derive the log-likelihood expression (using pen and paper) and use this in your code to avoid precision errors coming from first exponentiating and then taking logarithm (Example of such problem:`exp(-x^2/2)`

evaluates to`0`

for values of`x`

as low as`x=40`

. Taking`log`

of that produces`-Inf`

. But using`-x^2/2`

(manually derived log of exp) instead avoids that. Using log of MATLAB normpdf can cause similar problems.sigmas = 300:50:3500; x = compute_measurement_lr_cont(trn_2000.images); D.Mean = -2000; [L maximizerSigma maxL] = loglikelihood_sigma(x, D, sigmas); maximizerSigma = 2.1247e+03 maxL = -1.6345e+04

- Use the estimates to build a Bayesian classifier (use your implementation from the previous labs) assuming zero-one loss function. Apply the classifier to the test set and compute the classification error (notice, that the question 2 below asks for this number).

Fill the correct answers to your `answers.txt`

file.

- Is it true that with a higher number of samples in the training set we get more accurate parameter estimates?
- a) Yes
- b) No
- c) Not necessarily but it is more likely

- What is the classification error for the test set given the maximum likelihood estimates from
`trn_20`

,`trn_200`

and`trn_2000`

(rounded to three decimal points)? Use the following syntax:

question2 : [1.001, 2.002, 3.003]

Implement ML estimation of a *bivariate* normal distribution.

- Load the data file
`data_33rpz_cv04.mat`

. - Compute 2 dimensional features <latex>X = (x, y)^T</latex> as in the Bayes assignment. Use the
`trn_2000`

training set. - Estimate parameters of bivariate normal distributions $$f_k(\mathbf x) = \frac{1}{\sqrt{(2\pi)^2|\boldsymbol\Sigma_k|}} \exp\left(-\frac{1}{2}({\mathbf x}-{\boldsymbol\mu_k})^T{\boldsymbol\Sigma_k}^{-1}({\mathbf x}-{\boldsymbol\mu_k}) \right)$$ for classes $ k \in \{A,C\} $:
- $\boldsymbol\mu_k$ mean vectors
- $\boldsymbol\Sigma_k$ covariance matrices - do not use

function to calculate covariance matrix, you can take advantage of matrix multiplication, see Estimation_of_covariance_matrices#Estimation_in_a_general_context.**cov**

- Build Bayes classifier (use already computed priors $p_{K}(A)$ and $p_{K}(C)$), classify
`tst`

test data and display the results as in the main assignment above.

- [1] Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification.
- [2] Maximum Likelihood Parameter Estimation (short support text for labs)
- [3] Maximálně věrohodný odhad (longer text in Czech, includes multi-dimensional normal distribution estimates (13,14) needed for bonus task)

courses/ae4b33rpz/labs/04_mle/start.txt · Last modified: 2017/10/27 13:53 by drbohlav