====== Classification/recognition (last assignment) ======

The machine learning task has two parts - **symbol classification** and **determination of the optimal classifier parameter**. Check the [[http://example.com|Upload system ]] for the due date and notice that both tasks are submitted **separately**. 

Date provided by [[http://www.eyedea.cz|Eyedea Recognition]]. 

<note important>You have to implement your own methods and cannot use Matlab provided function used for Machine learning (e.g., fitcknn, predict)</note>

==== Problem ====
Your task is to classify letters from car license plates. We assume the plate is found (see Fig 3 and 4). The number pictures are normalized to the side 10x10 pixels. You can use training data that were chosen at random. The brightness values of pixels are ordered into row vectors by columns, see Fig 2. Thus a line in the input MAT-file means one image. (see [[https://download.cvut.cz/info/info.php]] if you need to download matlab).

The file ''train.mat'' contains the feature vectors, the file ''train_labels.mat'' contains the class labels. The pictures are only meant for a preview, the data contained in the MAT-files are the actually important data. The result of the classification should be in the same format as the variables in ''train_labels.mat''. The pictures in the lines of ''train.mat'' are the ones labeled in ''train_labels.mat''.

This is the usual extent of the available data given by a client of Eyedea Recognition. After the company (now you) writes his code, the client can come up with testing data in order to test your solution.

Test data will be in ''test.mat'' and ''test_labels.mat''. You can simulate this by dividing the training data for your own testing/training sets.


Use MATLAB for this assignment. The files are available here: {{:courses:b3b33kui:cviceni:strojove_uceni/kui_classification_students.zip|}}.


<note important>Avoid using the built-in function ''sumsqr''. This function is not available in Matlab, in the submission system.</note>

==== Task definition ====

Implement the following algorithms and use them to solve the above mentioned problem. Upload the files with the implemented functions ''nnLearn()'', ''nnClassify()'', ''bayesLearn()'' and ''bayesClassify()'' in the .m files of the same name into the upload system


**Nearest neighbor classifier** Implement the classifier using the nearest neighbor principle. Test its properties on the training data.

**Bayes classification** Implement the naive Bayes classification. Assume the intensity of the pixels is independent. In each class therefore it holds:

$$P(\vec{x}|s)=P(x_1|s) \cdot P(x_2|s) \dotsc P(x_n|s)$$

where $x_i$ is the intensity of the $i$th pixel.

**Note** In order to avoid zero probabilities (caused by small training sets), you need to add a small positive value to $P(x_i|s)$ for each $i$. If you miss this, then even one zero value $P(x_i|s)$ causes $P(\vec{x}|s)=0$.

==== Example solution – perceptron algorithm ====

You have one example solution at your disposal, which uses a perceptron for classification. An example of how to use it can be found in the ''main.m'' script.

In the script, it is shown how to load input files, train the perceptron, use the trained perceptron for classification, and get results (a confusion matrix) of the classification.

The basic functions are ''perceptronLearn()'', ''perceptronClassify()'' and ''confusionMatrix()''.

{{:courses:be5b33kui:labs:machine_learning:02_03_04_obr2.png?800|Examples of normalized letter images of registration plates.}}

Figure 1: //Examples of normalized letter images of registration plates.//

{{:courses:be5b33kui:labs:machine_learning:02_03_04_obr3.png?800|Pixels are represented by a row vector of concatenated columns. First comes the first column, then the second etc. It is obvious that dark columns of the letter J, which are in the extreme right part of the image, are at the end of the data vector.}}

Figure 2: //Pixels are represented by a row vector of concatenated columns. First comes the first column, then the second etc. It is obvious that dark columns of the letter J, which are in the extreme right part of the image, are at the end of the data vector.//


==== Auxiliary code description ====


== Main script ==

The script ''main'' calls learning and classification functions for all classifiers. It also loads input data and calls the function for printing results of the classifications. This script makes all needed callings of the functions. However, do not forget comment the callings of the functions that are not implemented at the beginning. 

== Perceptron classifier ==

This is the concrete implementation of the Perceptron classifier. See the implementation and the comments for more details.

**Perceptron learning**

''perceptron = perceptronLearn(train, train_labels)'' - the inputs of the function are the training data and their labels. The output of the function is the structure with the learned classifier. The function firstly makes mapping from the char class labels to the number class labels ("conversion table") and convert the char labels to the number labels. Finally there is the learning itself as per the following algorithm:
  - Set $\vec{w}_y=\vec{0}$ and $b_y=0$ for all $y \in Y$ ($Y$ - set of all possible labels).
  - Pick a random incorrectly classified input. If there is no such input, then STOP, because the learning has finished, by finding parameters for an errorfree classification of the input data.
  - Let $\vec{x}_t,y_t$ is an improperly classified input and $\hat{y}$ is a classification of $\vec{x}_t$ using the current classifier. Adapt the parameters of the classifier according to the following formulae: \\ $ \eqalign{ \vec{w}_{y_t} &= \vec{w}_{y_t} + \vec{x}_t \cr b_{y_t} &= b_{y_t} + 1 \cr \vec{w}_{\hat{y}} &= \vec{w}_{\hat{y}} - \vec{x}_t \cr b_{\hat{y}}&= b_{\hat{y}} - 1 } $
  - Continue with step 2.

**Perceptron classification**

''classLabels = perceptronClassify(perceptron, test)'' - the inputs of the function are the structure with the learned classifier and the testing data. The output of the function is an array with the class labels. Classification follows according to

$$\hat{y}=\arg \max\limits_{y \in Y} \vec{x}_t^\top\vec{w}_y+b_y$$

with the result being converted from the number labels to the char labels and then returned as an output.


== Confusion Matrix ==

''confusionMatrix()'' - the function prints result of the classification and the [[http://en.wikipedia.org/wiki/Confusion_matrix|confusion matrix]]. The inputs of the function are given labels and the labels obtained by the classification.   


=== Nearest Neighbor (NN) classifier ===

The functions ''nnLearn'' and ''nnClassify'' for the 1-nearest neighbor classifier ready for implementation.

''nn = nnLearn(train, trainLabels)'' - the inputs of the function are the training data and their labels. The output of the function is the structure with the learned classifier. 

''classLabels = nnClassify(nn,test)'' - the inputs of the function are the structure with the learned classifier and the testing data. The output of the function is an array with the class labels.

=== Naive Bayes classifier ===

The functions ''bayesLearn'' and ''bayesClassify'' for the naive Bayes classifier ready for implementation.

''bayes = bayesLearn(train, train_labels)'' - the inputs of the function are the training data and their labels. The output of the function is the structure with the learned classifier. 

''classLabels = bayesClassify(bayes, test)'' - the inputs of the function are the structure with the learned classifier and the testing data. The output of the function is an array with the class labels.

==== Examples of use ====

{{:courses:be5b33kui:labs:machine_learning:02_03_04_obr5.png?800|Automatic text localization from pictures. More information available at  [[http://cmp.felk.cvut.cz/~zimmerk/lpd/index.html|http://cmp.felk.cvut.cz/~zimmerk/lpd/index.html]].}}


Fig. 3: //Automatic text localization from pictures. More information available at  [[http://cmp.felk.cvut.cz/~zimmerk/lpd/index.html|http://cmp.felk.cvut.cz/~zimmerk/lpd/index.html]].//

{{:courses:be5b33kui:labs:machine_learning:02_03_04_obr6.png?800|Industry application for license plate recognition. Videos are available at  http://cmp.felk.cvut.cz/cmp/courses/X33KUI/Videos/RP_recognition.}}

Fig. 4: //Industry application for license plate recognition. Videos are available at  [[http://cmp.felk.cvut.cz/cmp/courses/X33KUI/Videos/RP_recognition|http://cmp.felk.cvut.cz/cmp/courses/X33KUI/Videos/RP_recognition]].//


====== Selection of optimal classifier ======

Very often, there are many classifiers that can be used for a ML task, and we have to make a decision about which is the best classifier for the task. In the zip archive {{:courses:b3b33kui:cviceni:strojove_uceni/kui_classification_students.zip|}}, there is a file ''classif_result_tables.mat'' (MATLAB) that will be used for the task. The task shall be solved in MATLAB. What is asked to be uploaded are a **pdf report** and one function related to the part **Safety first**.

We have 5 different learned binary classifiers. The result of the classification of each classifier depends on the value of the $\alpha$ parameter. Thus, the result of the classification of a given classifier can be expressed as a function $C(\bf x, \alpha) \in \{0,1\}$, where $\bf x$ is a vector which belongs to the sample we want to classify.

We tested all classifiers on a test set $X = \{{\bf x}_1, {\bf x}_2, \dots, {\bf x}_{100} \}$. At the same time, we tried all possible values of  $\alpha\in\{\alpha_1, \alpha_2, \dots, \alpha_{50}\}$. For a classifier $i\in\{1,2, \dots,5\}$ we obtain a table with values $C_i(\bf x_j,\alpha_k)\in \{0,1\}$, where $j \in \{1, 2,.., 100\}$, $k \in \{1, 2,.., 50\}$ (see //C1//, //C2//, //C3//, //C4//, //C5// in the file ''classif_result_tables.mat''). The real labels of the samples $\bf x_1, \bf x_2, \dots, \bf x_{100}$ from the test set are available (see //GT// in file ''classif_result_tables.mat'').


=== Selection of appropriate parameter ===
In this section, suppose that the classifiers are used for binary classification of images (e.g. whether a dog is on the picture or not). For the classifier 1 (table //C1//), determine the best value for parameter $\{\alpha_1, \alpha_2, \dots, \alpha_{50}\}$. Be aware that you don’t know the concrete task for which the classifier will be used. Therefore, it is necessary to use a sufficiently general approach. In other words, the classifier should not be one that is optimal for a particular task but globaly inefficient for most other tasks.


In a short (definitely shorter than one A4 page) **pdf report** explain the choice of the parameter (use terms such as sensitivity, false positive, ROC curve etc.). Inside the report, put the figure of a ROC curve with a marked point on the curve which correspond to the optimal value of the parameter.


=== Top secret! ===
Imagine, that you are an agent 00111 and you want to use your fingerprint to secure some top secret documents. The data are very sensitive, so it is better to delete them than secure them poorly. You also know that you will always have enough time to unlock the data.
Five trained classifiers (with different $\alpha$ values) are available. The input of the classifier is a fingerprint scan. For your fingerprint, desired output of the classifier is 1 (data will be unlocked), 0 otherwise (if it is not your fingerprint). All classifiers were tested using the test set $X$ for all possible values of the parameter $\alpha$. Results of the test for the classifiers are saved in tables  //C1//, //C2//, //C3//, //C4//, //C5// (see above). Ground truth values (real fingerprint affiliation) of the different scans are also available (see //GT//) 

Select the most suitable classifier and its $\alpha$ parameter.     

In the **pdf report** write your choice and explain the criterias you used for the choice.  


=== Safety first ===
This part is a continuation of the previous part **Top secret!**. A colleague, also an agent, will send you his classifier which also depends on the parameter $\alpha$. However, you are not sure about his loyalty, as he may be a double agent. Thus, it will be necessary to find if his classifier is better than the classifier you selected in the previous section.

For security reasons, you will have to make the decision about his classifier using a MATLAB function that will be created in advance. Input of the function will be table //C6// with the results of the classification on the set for different $\alpha$ parameters (same format as //C1//, //C2//, etc.) and eventually other input parameters of your choice. The output of the function should be the decision if the new classifier is better than the one that you selected yourself (''true'' if the obtained classifier is better than the previous one, ''false'' otherwise). In the **pdf report** explain the criterias that the function use. Submit also the function. 

====== References ======

Christopher M. Bishop. //Pattern Recognition and Machine Learning.// Springer Science+Bussiness Media, New York, NY, 2006.

T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. //IEEE Transactions on Information Theory,// 13(1):21–27, January 1967.

Richard O. Duda, Peter E. Hart, and David G. Stork. //Pattern classification.// Wiley Interscience Publication. John Wiley, New York, 2nd edition, 2001.

Vojtěch Franc and Václav Hlaváč. //Statistical pattern recognition toolbox for Matlab.// Research Report CTU–CMP–2004–08, Center for Machine Perception, K13133 FEE.
Czech Technical University, Prague, Czech Republic, June 2004. http://cmp.felk.cvut.cz/cmp/software/stprtool/index.html.

Michail I. Schlesinger and Václav Hlaváč. //Ten Lectures on Statistical and Structural Pattern Recognition.// Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002.

===== Evaluation =====


Both tasks are evaluated separately


==== Symbol classification - Evaluation ====

  * Closest neighbor classifier (1-NN) is evaluated according to the table below. [0–3 points]
  * The Naive Bayes classifier follows also the table below: [0–5 points]
  * Code quality: [0–2 points]

^1-NN^
| correctly classified | points |  
| >95% | 3 | 
| >80% | 2 | 
| >60% | 1 |
| =<60% | 0 |

^Naive Bayes classifier^
| correctly classified | points |  
| >82% | 5 | 
| >75% | 4 | 
| >70% | 3 |
| >65% | 2 | 
| >60% | 1 | 
| >55% | 0.5 |
| =<55% | 0 |

**Clean code** - you can follow rules for clean code even in Matlab. Look at the[[http://www.datatool.com/downloads/MatlabStyle2%20book.pdf| Datatool guide]].


==== Selection of optimal classifier -- Evaluation ====
  * Send a PDF report where you determine the optimal parameter. Based on the chosen parameter and the method of choosing the parameter you receive the points. [0–5 points]