====== Maximum Likelihood Parameter Estimation ======

At the very beginning of the recognition labs, we assumed the conditioned measurement probabilities $p_{X|k}(x|k)$ and the a priori probabilities $P_K(k)$ to be known and we used them to find the optimal {{courses:be5b33rpz:labs:02_bayes:|Bayesian strategy}}. Later, we looked at problems where the a priori probability is not known or does not exist and we constructed the optimal {{courses:be5b33rpz:labs:03_minimax:|minimax strategy}}. Today, we face the problem of unknown probability density function $p_{X|k}(x|k)$ and we will be interested in its estimation.

Lets assume in this lab that **we know** the density function is from a class of functions parametrised by **unknown parameters**. In our case, the known model is going to be Normal distribution and all we do not know are its parameters - mean and standard deviation. To estimate the parameters we will use so called **training examples** (training set) - a set of examples drawn randomly from some existing but unknown distribution.

The assignment has two parts. In the first one we will explore the importance of the **sufficient training set size**. We will see what happens if we do not have enough training examples for the estimate. In the second part, we will continue with the letter classification task and we will train a /*so called "naive" - see the comment in Section 2*/ Bayes classifier using training sets of different sizes and we will focus on **maximising the likelihood function**.

<WRAP center round info>
To fulfil this assignment, you need to submit these files (all packed in a single ''.zip'' file) into the [[https://cw.felk.cvut.cz/sou/ | upload system]]:
  * **''answers.txt''** - answers to the Assignment Questions
  * **''assignment_04.m''** - a script for data initialization, calling of the implemented functions and plotting  of their results (for your convenience, will not be checked)
  * **''mle_normal.m''** - function for estimating the parameters of a normal distribution from a training set
  * **''mle_variance.m''** - function for computing the estimate variance over 100 random training sets
  * **''estimate_prior.m''** - functions for computing the prior probability of a given class
  * **''loglikelihood_sigma.m''** - functions which computes the log-likelihood for a given class with arbitrary sigma values
  * **''mle_variances.png''** - the graf of estimate variance as a function of the training set size
  * **''loglikelihood20.png''**, **''loglikelihood200.png''**, **''loglikelihood2000.png''** - images of log likelihood for different training sizes
  * **''mle_estimatesA.png''**, **''mle_estimatesC.png''** - images of estimated distributions for class A and C
  * **''mle_classif20.png''**, **''mle_classif200.png''**, **''mle_classif2000.png''** - classification results for our three estimates

Start by downloading the **[[https://cw.felk.cvut.cz/courses/RPZ/assignment_templates/assignment_mle_template.zip|template]] of the assignment**. 
</WRAP>


===== Information for development in Python =====

<WRAP center round info>

**Experimental. Use at your own risk. **

**Standard is to use Matlab. You may skip this info box if you follow the standard rules.**

[[https://cw.fel.cvut.cz/b181/courses/be5b33rpz/labs/python_development|General information for Python development]].

To fulfil this assignment, you need to submit these files (all packed in a single ''.zip'' file) into the [[https://cw.felk.cvut.cz/sou/ | upload system]]:
  * **''answers.txt''** - answers to the Assignment Questions
  * **''mle.ipynb''** - a script for data initialization, calling of the implemented functions and plotting of their results (for your convenience, will not be checked)
  * **''mle.py''** - containing the following implemented methods:
    * **''mle_normal''** - function for estimating the parameters of a normal distribution from a training set
    * **''mle_variance''** - function for computing the estimate variance over 100 random training sets
    * **''estimate_prior''** - functions for computing the prior probability of a given class
    * **''loglikelihood_sigma''** - functions which computes the log-likelihood for a given class with arbitrary sigma values
  * **''mle_variances.png''** - the graf of estimate variance as a function of the training set size
  * **''loglikelihood20.png''**, **''loglikelihood200.png''**, **''loglikelihood2000.png''** - images of log likelihood for different training sizes
  * **''mle_estimatesA.png''**, **''mle_estimatesC.png''** - images of estimated distributions for class A and C
  * **''mle_classif20.png''**, **''mle_classif200.png''**, **''mle_classif2000.png''** - classification results for our three estimates


** Use [[https://cw.fel.cvut.cz/b181/courses/be5b33rpz/labs/python_development#Assignment Templates|template]] of the assignment.** When preparing the archive file for the upload system, **do not include any directories**, the files have to be in the archive file root.
</WRAP>

===== Tasks, part 1: About the training set size =====

First we use a synthetic experiment to demonstrate the effect of the training set size on the parameter estimate. We will generate random training sets of different sizes from the normal distribution (using ''randn'' function) and for each set we do the maximum likelihood estimate of mean and standard deviation parameters. For each training set size we will repeat the estimate 100x and we will measure the "stability" of the estimate by variance over these 100 estimates.

Do the following:
  - Complete the template function **''[mu sigma] = mle_normal(x)''** so that for given training set it returns the maximum likelihood estimates of mean and standard deviation of a normal distribution.\\ \\ **Hint:** The closed form solution for the maximum likelihood estimator with normal distributions is explained in {{:courses:be5b33rpz:labs:04_mle:ml.pdf| [2]}}. Do not use MATLAB functions ''mean'', ''std'' and ''var'' in this assignment at all.\\ \\ <code matlab>x = [0 1 2 3 4 5 6 7 8 9 10];
[mu sigma] = mle_normal(x)

mu = 5
sigma = 3.1623</code>
  - Complete the template function **''[var_mean var_sigma] = mle_variance(cardinality)''** which for given training set size generates randomly (using the ''randn'' function) 100 training sets of that size, calls **''mle_normal''** on each of them and returns the variance of the estimated mean and standard deviation.\\ \\ **Hint 1:** Generate the data from N(0,1) as done by default by ''randn'' function.\\ \\ **Hint 2:** Do not specify the [[wp>random seed]] in any way, so that we can test the code.\\ \\ **Hint 3:** Do not use the MATLAB ''var'' or ''std'' functions, use the expression from point 1. \\ \\
  - Plot the estimate variances for various training set sizes into one graph and save it as ''**mle_variances.png**''.\\ \\ **Hint:** Graphs depend on the random generator seed and may differ.
{{ :courses:be5b33rpz:labs:04_mle:mle_variances.png?nolink |}}

===== Tasks, part 2: Building a classifier =====

We will keep on solving the letters classification task. However, in contrast to the previous labs, both, the a priori probabilities $p_{K}(k)$ and the conditional probability density functions $p_{Xk}(x|k)$ are unknown. We have a set of training images from which the probabilities can be estimated and test examples (test set) to verify our estimate.

To estimate $p_{X|k}(x|k)$ we assume that a normal distribution fits the data, $p_{X|k}(x|k) \sim N(\mu_k, \sigma_k)$,  and we compute its parameters $\mu_k$, $\sigma_k$ with the maximum likelihood criteria. Here, the $k$ is the class label - the letter we consider. The a priori probability of each class can be simply estimated by taking the fraction of the training images belonging to this class.

Having estimated the a priori probabilities and the conditional probability density functions we can use the Bayesian classifier to solve the problem. 


/*
This approach, where density estimates are used instead of the true densities, is called "naive Bayes" in the literature.

LukeN: I don't think the statement below is correct, "naive Bayes" is used when we assume independence of different features, ie. we assume p(x1,x2|k) = p(x1|k)p(x2|k). Moreover, an assumption that there is some 'true' density which we are after is rather problematic, but this is perhaps more of a philosophical question...

 */


The performance of the classifier will be measured with the classification error using a different set of pre-labelled images called **test set**, which was not used in the training stage. We will work with three different training sets of images (20, 200 and 2000 images) so we can experiment with the performance of the classifier, i.e. observing the classification error, when the set size changes. 

As before, the following simple measurement will be used during the task:

<code>
x = (sum of pixel values in the left half of image)
   -(sum of pixel values in the right half of image)
</code>
as implemented in **''x = compute_measurement_lr_discrete(imgs)''**.

Do the following:
  - Load the data file ''{{:courses:be5b33rpz:labs:04_mle:data_33rpz_cv04.mat|data_33rpz_cv04.mat}}'' into Matlab. Three image sets for training are available, ''**trn_20**'', ''**trn_200**'' and ''**trn_2000**'' with 20, 200 and 2000 pre-labelled images.\\ \\
  - For all the training sets do the following:
     - Estimate the prior probabilities $p_{K}(A)$ and $p_{K}(C)$. For this, complete the template **''prior = estimate_prior(idLabel, labelling)''**.\\ \\ <code matlab>labelling = [1 1 1 0 0 0 0 0 0];
prior = estimate_prior(1, labelling)

prior = 0.3333</code>
     - Compute the parameters $\mu_k$ and $\sigma_k$ of the conditional distributions $p_{Xk}(x|A)$ and $p_{Xk}(x|C)$, with the maximum likelihood estimator using the **''[mu sigma] = mle_normal(x)''** function implemented in the first part.\\ \\
     - Complete the template **''[L maximizerSigma maxL] = loglikelihood_sigma(x, D, sigmas)''** which for fixed $\mu_k$ computed in 2.II computes the log-likelihood function ''L'' as a function of $\sigma_k$. The function should also return the $\sigma^*_k$ corresponding to the maximum of the log-likelihood function. \\ \\ **Hint:** $L$ is defined in equation (2) in {{:courses:be5b33rpz:labs:04_mle:ml.pdf| [2]}} (or [1], eq 5, page 87).\\ \\ **Hint:** Use the same function as in the last assignment to find a maximum of a function: <code matlab>x_maximizer = fminbnd(@(x) -some_function(other_param1, other_param2, x), 0, 1) </code> **Hint:** Derive the log-likelihood expression (using pen and paper) and use this in your code to avoid precision errors coming from first exponentiating and then taking logarithm (Example of such problem: ''exp(-x^2/2)'' evaluates to ''0'' for values of ''x'' as low as ''x=40''. Taking ''log'' of that produces ''-Inf''. But using ''-x^2/2'' (manually derived log of exp) instead avoids that. Using log of MATLAB normpdf can cause similar problems. <code matlab>sigmas = 300:50:3500;
x = compute_measurement_lr_cont(trn_2000.images);
D.Mean = -2000;
[L maximizerSigma maxL] = loglikelihood_sigma(x, D, sigmas);

maximizerSigma = 2.1247e+03
maxL = -1.6345e+04</code>
     -  Plot the computed log-likelihood as a function of $\sigma_k$ for $k=A$ and save it into ''**loglikelihoodXXX.png**'' (XXX corresponds to the training set size, i.e. 20, 200, or 2000) \\ {{:courses:be5b33rpz:labs:04_mle:plot_loglikelihood_vs_sigma.png|}}\\ \\
  - Plot all the estimates of $p_{Xk}(x|A)$ and $p_{Xk}(x|C)$ from different training set sizes into one graph together with normalised histogram of the ''trn_2000'' images. Save this as ''**mle_estimatesA.png**'' and ''**mle_estimatesC.png**''.\\ {{:courses:be5b33rpz:labs:04_mle:mle_estimatea.png?500|}} {{:courses:be5b33rpz:labs:04_mle:mle_estimatec.png?500|}} \\ \\
  - Use the estimates to build a Bayesian classifier (use your implementation from the previous labs) assuming zero-one loss function. Apply the classifier to the test set and compute the classification error (notice, that the question 2 below asks for this number).\\ \\
  - Show the final classification in one figure and save it as ''**mle_classifXXX.png**'' (again, XXX is either 20, 200 or 2000).\\ {{:courses:be5b33rpz:labs:04_mle:mle_classif.png|}}
===== Assignment Questions =====

Fill the correct answers to your ''answers.txt'' file.
<WRAP center round help>
  - Is it true that with a higher number of samples in the training set we get more accurate parameter estimates?
    * a) Yes
    * b) No
    * c) Not necessarily but it is more likely
  - What is the classification error for the test set given the maximum likelihood estimates from ''trn_20'', ''trn_200'' and ''trn_2000'' (rounded to three decimal points)? Use the following syntax:
<code yaml>  
question2 : [1.001, 2.002, 3.003]
</code>    

</WRAP>
===== Bonus Task =====

Implement ML estimation of a //bivariate// normal distribution. 

  - Load the data file ''{{:courses:be5b33rpz:labs:04_mle:data_33rpz_cv04.mat|data_33rpz_cv04.mat}}''.   
  - Compute 2 dimensional features  <latex>X = (x, y)^T</latex> as in the [[courses:be5b33rpz:labs:02_bayes:start|Bayes]] assignment. Use the ''trn_2000'' training set.
  - Estimate parameters of bivariate normal distributions $$f_k(\mathbf x) =
\frac{1}{\sqrt{(2\pi)^2|\boldsymbol\Sigma_k|}}
\exp\left(-\frac{1}{2}({\mathbf x}-{\boldsymbol\mu_k})^T{\boldsymbol\Sigma_k}^{-1}({\mathbf x}-{\boldsymbol\mu_k})
\right)$$ for classes $ k \in \{A,C\} $:
    * $\boldsymbol\mu_k$ mean vectors
    * $\boldsymbol\Sigma_k$ covariance matrices - do not use ''**cov**'' function to calculate covariance matrix, you can take advantage of matrix multiplication, see  [[wp>Estimation_of_covariance_matrices#Estimation_in_a_general_context]]. 
  - Build Bayes classifier (use already computed priors $p_{K}(A)$ and $p_{K}(C)$), classify ''tst'' test data and display the results as in the main assignment above.
  - Display the distribution estimates (use the function ''pgauss'') and the test set (function ''ppatterns'').\\ {{:courses:be5b33rpz:labs:04_mle:2d_reseni.png|2D řešení}}

===== Recommended Literature =====

  * [1] Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification.
  * [2] {{:courses:be5b33rpz:labs:04_mle:ml.pdf|Maximum Likelihood Parameter Estimation}} (short support text for labs)
  * [3] {{:courses:be5b33rpz:labs:04_mle:ml_cz.pdf|Maximálně věrohodný odhad}} (longer text in Czech, includes multi-dimensional normal distribution estimates (13,14) needed for bonus task)