Classification/recognition (last assignment)

The machine learning task has two parts - symbol classification and determination of the optimal classifier parameter. Check the Upload system for the due date and notice that both tasks are submitted separately.

Date provided by Eyedea Recognition.

You have to implement your own methods and cannot use Matlab provided function used for Machine learning (e.g., fitcknn, predict)

Problem

Your task is to classify letters from car license plates. We assume the plate is found (see Fig 3 and 4). The number pictures are normalized to the side 10×10 pixels. You can use training data that were chosen at random. The brightness values of pixels are ordered into row vectors by columns, see Fig 2. Thus a line in the input MAT-file means one image. (see https://download.cvut.cz/info/info.php if you need to download matlab).

The file train.mat contains the feature vectors, the file train_labels.mat contains the class labels. The pictures are only meant for a preview, the data contained in the MAT-files are the actually important data. The result of the classification should be in the same format as the variables in train_labels.mat. The pictures in the lines of train.mat are the ones labeled in train_labels.mat.

This is the usual extent of the available data given by a client of Eyedea Recognition. After the company (now you) writes his code, the client can come up with testing data in order to test your solution.

Test data will be in test.mat and test_labels.mat. You can simulate this by dividing the training data for your own testing/training sets.

Use MATLAB for this assignment. The files are available here: kui_classification_students.zip.

Avoid using the built-in function sumsqr. This function is not available in Matlab, in the submission system.

Task definition

Implement the following algorithms and use them to solve the above mentioned problem. Upload the files with the implemented functions nnLearn(), nnClassify(), bayesLearn() and bayesClassify() in the .m files of the same name into the upload system

Nearest neighbor classifier Implement the classifier using the nearest neighbor principle. Test its properties on the training data.

Bayes classification Implement the naive Bayes classification. Assume the intensity of the pixels is independent. In each class therefore it holds:

$$P(\vec{x}|s)=P(x_1|s) \cdot P(x_2|s) \dotsc P(x_n|s)$$

where $x_i$ is the intensity of the $i$th pixel.

Note In order to avoid zero probabilities (caused by small training sets), you need to add a small positive value to $P(x_i|s)$ for each $i$. If you miss this, then even one zero value $P(x_i|s)$ causes $P(\vec{x}|s)=0$.

Example solution – perceptron algorithm

You have one example solution at your disposal, which uses a perceptron for classification. An example of how to use it can be found in the main.m script.

In the script, it is shown how to load input files, train the perceptron, use the trained perceptron for classification, and get results (a confusion matrix) of the classification.

The basic functions are perceptronLearn(), perceptronClassify() and confusionMatrix().

Figure 1: Examples of normalized letter images of registration plates.

Figure 2: Pixels are represented by a row vector of concatenated columns. First comes the first column, then the second etc. It is obvious that dark columns of the letter J, which are in the extreme right part of the image, are at the end of the data vector.

Auxiliary code description

Main script

The script main calls learning and classification functions for all classifiers. It also loads input data and calls the function for printing results of the classifications. This script makes all needed callings of the functions. However, do not forget comment the callings of the functions that are not implemented at the beginning.

Perceptron classifier

This is the concrete implementation of the Perceptron classifier. See the implementation and the comments for more details.

Perceptron learning

perceptron = perceptronLearn(train, train_labels) - the inputs of the function are the training data and their labels. The output of the function is the structure with the learned classifier. The function firstly makes mapping from the char class labels to the number class labels (“conversion table”) and convert the char labels to the number labels. Finally there is the learning itself as per the following algorithm:

Set $\vec{w}_y=\vec{0}$ and $b_y=0$ for all $y \in Y$ ($Y$ - set of all possible labels).
Pick a random incorrectly classified input. If there is no such input, then STOP, because the learning has finished, by finding parameters for an errorfree classification of the input data.
Let $\vec{x}_t,y_t$ is an improperly classified input and $\hat{y}$ is a classification of $\vec{x}_t$ using the current classifier. Adapt the parameters of the classifier according to the following formulae:
$ \eqalign{ \vec{w}_{y_t} &= \vec{w}_{y_t} + \vec{x}_t \cr b_{y_t} &= b_{y_t} + 1 \cr \vec{w}_{\hat{y}} &= \vec{w}_{\hat{y}} - \vec{x}_t \cr b_{\hat{y}}&= b_{\hat{y}} - 1 } $
Continue with step 2.

Perceptron classification

classLabels = perceptronClassify(perceptron, test) - the inputs of the function are the structure with the learned classifier and the testing data. The output of the function is an array with the class labels. Classification follows according to

$$\hat{y}=\arg \max\limits_{y \in Y} \vec{x}_t^\top\vec{w}_y+b_y$$

with the result being converted from the number labels to the char labels and then returned as an output.

Confusion Matrix

confusionMatrix() - the function prints result of the classification and the confusion matrix. The inputs of the function are given labels and the labels obtained by the classification.

Nearest Neighbor (NN) classifier

The functions nnLearn and nnClassify for the 1-nearest neighbor classifier ready for implementation.

nn = nnLearn(train, trainLabels) - the inputs of the function are the training data and their labels. The output of the function is the structure with the learned classifier.

classLabels = nnClassify(nn,test) - the inputs of the function are the structure with the learned classifier and the testing data. The output of the function is an array with the class labels.

Naive Bayes classifier

The functions bayesLearn and bayesClassify for the naive Bayes classifier ready for implementation.

bayes = bayesLearn(train, train_labels) - the inputs of the function are the training data and their labels. The output of the function is the structure with the learned classifier.

classLabels = bayesClassify(bayes, test) - the inputs of the function are the structure with the learned classifier and the testing data. The output of the function is an array with the class labels.

Examples of use

Fig. 3: Automatic text localization from pictures. More information available at http://cmp.felk.cvut.cz/~zimmerk/lpd/index.html.

Fig. 4: Industry application for license plate recognition. Videos are available at http://cmp.felk.cvut.cz/cmp/courses/X33KUI/Videos/RP_recognition.

Selection of optimal classifier

Very often, there are many classifiers that can be used for a ML task, and we have to make a decision about which is the best classifier for the task. In the zip archive kui_classification_students.zip, there is a file classif_result_tables.mat (MATLAB) that will be used for the task. The task shall be solved in MATLAB. What is asked to be uploaded are a pdf report and one function related to the part Safety first.

We have 5 different learned binary classifiers. The result of the classification of each classifier depends on the value of the $\alpha$ parameter. Thus, the result of the classification of a given classifier can be expressed as a function $C(\bf x, \alpha) \in \{0,1\}$, where $\bf x$ is a vector which belongs to the sample we want to classify.

We tested all classifiers on a test set $X = \{{\bf x}_1, {\bf x}_2, \dots, {\bf x}_{100} \}$. At the same time, we tried all possible values of $\alpha\in\{\alpha_1, \alpha_2, \dots, \alpha_{50}\}$. For a classifier $i\in\{1,2, \dots,5\}$ we obtain a table with values $C_i(\bf x_j,\alpha_k)\in \{0,1\}$, where $j \in \{1, 2,.., 100\}$, $k \in \{1, 2,.., 50\}$ (see C1, C2, C3, C4, C5 in the file classif_result_tables.mat). The real labels of the samples $\bf x_1, \bf x_2, \dots, \bf x_{100}$ from the test set are available (see GT in file classif_result_tables.mat).

Selection of appropriate parameter

In this section, suppose that the classifiers are used for binary classification of images (e.g. whether a dog is on the picture or not). For the classifier 1 (table C1), determine the best value for parameter $\{\alpha_1, \alpha_2, \dots, \alpha_{50}\}$. Be aware that you don’t know the concrete task for which the classifier will be used. Therefore, it is necessary to use a sufficiently general approach. In other words, the classifier should not be one that is optimal for a particular task but globaly inefficient for most other tasks.

In a short (definitely shorter than one A4 page) pdf report explain the choice of the parameter (use terms such as sensitivity, false positive, ROC curve etc.). Inside the report, put the figure of a ROC curve with a marked point on the curve which correspond to the optimal value of the parameter.

Top secret!

Imagine, that you are an agent 00111 and you want to use your fingerprint to secure some top secret documents. The data are very sensitive, so it is better to delete them than secure them poorly. You also know that you will always have enough time to unlock the data. Five trained classifiers (with different $\alpha$ values) are available. The input of the classifier is a fingerprint scan. For your fingerprint, desired output of the classifier is 1 (data will be unlocked), 0 otherwise (if it is not your fingerprint). All classifiers were tested using the test set $X$ for all possible values of the parameter $\alpha$. Results of the test for the classifiers are saved in tables C1, C2, C3, C4, C5 (see above). Ground truth values (real fingerprint affiliation) of the different scans are also available (see GT)

Select the most suitable classifier and its $\alpha$ parameter.

In the pdf report write your choice and explain the criterias you used for the choice.

Safety first

This part is a continuation of the previous part Top secret!. A colleague, also an agent, will send you his classifier which also depends on the parameter $\alpha$. However, you are not sure about his loyalty, as he may be a double agent. Thus, it will be necessary to find if his classifier is better than the classifier you selected in the previous section.

For security reasons, you will have to make the decision about his classifier using a MATLAB function that will be created in advance. Input of the function will be table C6 with the results of the classification on the set for different $\alpha$ parameters (same format as C1, C2, etc.) and eventually other input parameters of your choice. The output of the function should be the decision if the new classifier is better than the one that you selected yourself (true if the obtained classifier is better than the previous one, false otherwise). In the pdf report explain the criterias that the function use. Submit also the function.

References

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer Science+Bussiness Media, New York, NY, 2006.

T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, January 1967.

Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern classification. Wiley Interscience Publication. John Wiley, New York, 2nd edition, 2001.

Vojtěch Franc and Václav Hlaváč. Statistical pattern recognition toolbox for Matlab. Research Report CTU–CMP–2004–08, Center for Machine Perception, K13133 FEE. Czech Technical University, Prague, Czech Republic, June 2004. http://cmp.felk.cvut.cz/cmp/software/stprtool/index.html.

Michail I. Schlesinger and Václav Hlaváč. Ten Lectures on Statistical and Structural Pattern Recognition. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002.

Evaluation

Both tasks are evaluated separately

Symbol classification - Evaluation

Closest neighbor classifier (1-NN) is evaluated according to the table below. [0–3 points]
The Naive Bayes classifier follows also the table below: [0–5 points]
Code quality: [0–2 points]

1-NN
correctly classified	points
>95%	3
>80%	2
>60%	1
=<60%	0

Naive Bayes classifier
correctly classified	points
>82%	5
>75%	4
>70%	3
>65%	2
>60%	1
>55%	0.5
=<55%	0

Clean code - you can follow rules for clean code even in Matlab. Look at the Datatool guide.

Selection of optimal classifier -- Evaluation

Send a PDF report where you determine the optimal parameter. Based on the chosen parameter and the method of choosing the parameter you receive the points. [0–5 points]

Table of Contents