In this lab we will implement an OCR system. For simplicity we assume the texts contain only two letters: A and C, and that the letters are already well segmented into 10×10 pixel image chips. We will extract a simple feature from the image chips and use it in a bayesian decision framework to partition the image set. Furthermore, the relative frequency of occurrence of A and C in text is given. We will consider two cases: (1) the measurements are discrete, and (2) the measurements are continuous.
In the discrete probabilities case, we will find the optimal strategy, compute its Bayesian risk, experiment with the loss function and classify independent data samples by applying the theory from the lecture slides directly.
In the continuous measurements case, an assumption is made that the extracted feature for each letter class (A,C) is generated by a normal distribution. The 'zero-one' loss will be used to estimate risk, which simplifies the decision strategy.
To fulfil this assignment, you need to submit these files (all packed in a single .zip
file) into the upload system:
answers.txt
- answers to the Assignment Questions
assignment_bayes.m
- a script for data initialisation, calling of the implemented functions and plotting (not tested in the upload system)
find_strategy_discrete.m
- function which finds the optimal discrete strategy of their results (for your convenience, will not be checked)
bayes_risk_discrete.m
- function which computes the Bayesian risk for a given discrete strategy
classify_discrete.m
- function which classifies a given data sample using a discrete strategy
classification_error_discrete.m
- function which computes classification error on test data using discrete strategy
find_strategy_2normal.m
- function which finds the optimal continuous strategy
bayes_risk_2normal.m
- function which computes the Bayesian risk for a given continuous strategy
classify_2normal.m
- function which classifies a given data sample using a continuous strategy
classification_error_2normal.m
- function which computes classification error on test data using continuous strategy
classif_W1.png
, classif_W2.png
and decision_discrete.png
- images specified in Discrete measurements
Start by downloading the template of the assignment. It contains function templates and the data needed for the tasks. When submitting the assignment to the upload system, please upload all the files included in the template.
This is highly experimental feature. Use at your own risk.
Standard is to use Matlab. You may skip this info box if you follow the standard rules.
General information for Python development.
To fulfil this assignment, you need to submit these files (all packed in one .zip
file) into the upload system:
answers.txt
- answers to the Assignment Questions
bayes.ipynb
- a script for data initialisation, calling of the implemented functions and plotting of their results (for your convenience, will not be checked).
bayes.py
- file with implemented methods:
find_strategy_discrete
- function which finds the optimal discrete strategy (for your convenience, will not be checked)
bayes_risk_discrete
- function which computes the Bayesian risk for a given discrete strategy
classify_discrete
- function which classifies a given data sample using a discrete strategy
classification_error_discrete
- function which computes classification error on test data using discrete strategy
find_strategy_2normal
- function which finds the optimal continuous strategy
bayes_risk_2normal
- function which computes the Bayesian risk for a given continuous strategy
classify_2normal
- function which classifies a given data sample using a continuous strategy
classification_error_2normal
- function which computes classification error on test data using continuous strategy
classif_W1.png
, classif_W2.png
and decision_discrete.png
- images specified in Discrete measurements
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.
The task is to design a classifier (Bayesian strategy) $q(x)$, which distinguishes between 10×10 images of two letters (classes) A and C using only a single measurement (feature):
$$ q: \mathcal{X} \rightarrow D $$ $$ \mathcal{X} = \{-10, \ldots, 10\}, \quad D = K = \{A, C\}$$
The measurement $x$ is obtained from an image by computing
mu = -563.9 sigma = 2001.6 x_unnormalized = (sum of pixel values in the left half of image) -(sum of pixel values in the right half of image) x = (x_unnormalized - mu) / (2 * sigma) * 10 limit x to the interval <-10, 10>Note: the normalization squeezes most of the $x$ values in the interval $< -10, 10>$.
Computation of this measurement is implemented in compute_measurement_lr_discrete.m
.
The strategy $q(x)$ will be represented in Matlab by a 1×21 vector containing 1 if the classification for that value of $x$ is supposed to be A, and 2 if the classification is supposed to be C. Thus given the vector $q$ and some $x$ we can decide for one of the classes.
As required in the formulation of Bayesian task, we are given all the necessary probabilities (specified in the assignment_bayes.m
):
variable | description |
---|---|
discreteA.Prob | $p_{X|k}(x|A)$ given as a $1\times21$ vector (the size corresponds to the range of the values of $x$) |
discreteA.Prior | prior probability $p_K(A)$ |
discreteC.Prob | $p_{X|k}(x|C)$ given as a $1\times21$ vector (the size corresponds to the range of the values of $x$) |
discreteC.Prior | prior probability $p_K(C)$ |
images_test | 10x10x40, test images of size 10×10 pixels, depicting letters A and C |
labels_test | 1×40, vector with ground-truth classification (1=A, 2=C) |
bayes_risk_discrete.m
so that it returns the Bayesian risk for a given strategy. The input parameters are the distributions, the strategy $q$ and arbitrary loss function $W$ (2×2 matrix). >> q_discrete = [ones(1, 10), 2 ones(1, 10)]; W = [0 1; 1 0]; R_discrete = bayes_risk_discrete(discreteA, discreteC, W, q_discrete) R_discrete = 0.5154
find_strategy_discrete.m
so that it returns the optimal Bayesian strategy. When the bayesian risk is the same for both classes for a particular $x$, prefer the “A” class to pass the automatic checks in the upload system. >> W = [0 1; 1 0]; q_discrete = find_strategy_discrete(discreteA, discreteC, W) q_discrete = Columns 1 through 14 1 1 1 1 1 2 2 2 1 1 1 1 1 1 Columns 15 through 21 1 1 1 1 1 1 1 >> distribution1.Prior = 0.3; >> distribution2.Prior = 0.7; >> distribution1.Prob = [0.2, 0.3, 0.4, 0.1]; >> distribution2.Prob = [0.5, 0.4, 0.1, 0.0]; >> W = [0 1; 1 0]; >> q = find_strategy_discrete(distribution1, distribution2, W) q = 2 2 1 1
visualise_discrete
function for two loss functions: W1=[0 1; 1 0]
and W2=[0 5; 1 0]
and save the figures as classif_W1.png
and classif_W2.png
.classify_discrete.m
function template so that it returns N labels given 10x10xN matrix of images.classification_error_discrete.m
function so that it returns the error on the whole image set images_test
. The error is defined as number of incorrectly classified samples divided by total number of samples. To measure the error compare your classification with the ground truth in labels_test
variable. >> W = [0 1; 1 0]; q_discrete = find_strategy_discrete(discreteA, discreteC, W); error_discrete = classification_error_discrete(images_test, labels_test, q_discrete) error_discrete = 0.2250
decision_discrete.png
.
Fill the correct answers to your answers.txt
file.
discreteA.prob
vector? (you should also be able to argue why :)
W=[0 1; 2 0]
? (do you have an idea where you would make it unbalanced this way?) round to 4 decimal places
classification_error_discrete
function) and the Bayesian risk (calculated by the bayes_risk_discrete
function):
In the second part of the assignment the probabilities $p_{X|k}(x|k)$ are given as continuous normal distributions (specified in the assignment_bayes.m
): $$p_{X|k}(x|A) = \frac{1}{\sigma_k\sqrt{2\pi}}e^{-\frac{(x-\mu_k)^2}{2\sigma_k^2}}$$
variable | description |
---|---|
contA.Mean | mean value of the normal distribution $p_{X|k}(x|A)$ |
contA.Sigma | standard deviation of the normal distribution $p_{X|k}(x|A)$ |
contA.Prior | prior probability $p_K(A)$ |
contC.Mean | mean value of the normal distribution $p_{X|k}(x|C)$ |
contC.Sigma | standard deviation of the normal distribution $p_{X|k}(x|C)$ |
contC.Prior | prior probability $p_K(C)$ |
images_test | 10x10x40, test images of size 10×10 pixels, depicting letters A and C |
labels_test | 1×40, vector with ground-truth classification (1=A, 2=C) |
Further we assume zero-one loss matrix W = [0 1; 1 0]
.
In this particular case, as explained in lecture notes, slides 21-22, the optimal strategy $q$ can be found by solving the following problem:
$$q(x) = \arg\max_k p_{K|X}(k|x) = \arg\max_k p_{XK}(x,k)/p_X(x) = \arg\max_k p_{XK}(x,k) = \arg\max_k p_K(k)p_{X|K}(x|k),$$
where $p_{K|X}(k|x)$ is called posterior probability of class $k$ given the measurement $x$ (i.e. it is the probability of the data being from class $k$ if we measure feature value $x$). Symbol $\arg\max_k$ denotes finding $k$ maximising the argument.
We will also work with an unnormalised measurement here instead of the discrete one used above
x = ((sum of pixel values in the left half of image) -(sum of pixel values in the right half of image))Computation of this measurement is implemented in
compute_measurement_lr_cont.m
.
At the beginning of the labs we will together show that in the case when $p_{X|k}(x|k)$ are normal distributions and when we consider zero-one loss function, the optimal Bayesian strategy $q(x)$ corresponds to a quadratic inequality (i.e. $ax^2+bx+c \ge 0$).
Hint: $$ \arg\max_{k\in\{A,C\}} p_K(k)p_{X|k}(x|k) \quad \mbox{translates into} \quad p_K(A)p_{X|k}(x|A) \ge p_K(C)p_{X|k}(x|C) $$
Optimal strategy representation: Solving the above quadratic inequality gives up to two thresholds $t_1$ and $t_2$ (zero crossing points) and a classification decision (A or C) in each of the intervals $\mathcal{X}_1 = (-\infty, t_1>$, $\mathcal{X}_2 = (t_1, t_2>$, $\mathcal{X}_3 = (t_2, \infty)$. So similarly to the discrete case we can represent the strategy in Matlab as a structure with the following fields:
field | description |
---|---|
q.t1 | $t_1$ threshold (possibly $-\infty$) |
q.t2 | $t_2$ threshold (possibly $\infty$) |
q.decision | 1×3 decision vector containing the optimal decisions for each of the three intervals similarly to the discrete case |
find_strategy_2normal.m
.find_strategy_2normal.m
template by filling in code for the determining the decision vectors.>> q_cont = find_strategy_2normal(contA, contC) q_cont = t1: -3.5360e+03 t2: -1.2488e+03 decision: [1 2 1]
bayes_risk_2normal.m
. We can reformulate the equations from lecture slides, slide 21 as $$ R(q) = \int_X \sum_{k\in K}p_{XK}(x,k)W(k,q(x)) dx = \\ = \int_X p_X(x) \sum_{k\ne q(x)}p_{K|x}(k|x) dx = \\ = \int_X p_X(x)(1 - p_{K|x}(q(x)|x)) dx = \\ = 1 - \int_X p_X(x)p_{K|x}(q(x)|x) dx = \\ = 1 - \int_X p_K(q(x))p_{X|k}(x|q(x)) dx\,,$$ which in our special case simplifies further to $$R(q) = 1-\int_X p_K(q(x))p_{X|k}(x|q(x)) = \\ = 1 - \left( \int_{\mathcal{X}_1} p_K(\mbox{q.decision}(1))p_{X|k}(x|\mbox{q.decision}(1)) + \int_{\mathcal{X}_2} p_K(\mbox{q.decision}(2))p_{X|k}(x|\mbox{q.decision}(2)) + \\ \int_{\mathcal{X}_3} p_K(\mbox{q.decision}(3))p_{X|k}(x|\mbox{q.decision}(3))\right).$$ Thus we only need to compute integrals of the normal distribution corresponding to the strategy decision class over three intervals, multiply them by apriori probabilities and subtract their sum from one.normcdf
function from the statistics toolbox.>> R_cont = bayes_risk_2normal(contA, contC, q_cont) R_cont = 0.1352
classify_2normal.m
function template so that it returns a label given a 10×10 image.classification_error_2normal.m
function so that it returns the error on the whole image set images
. To measure the error compare your classification with the ground truth in labels
variable.>> error_cont = classification_error_2normal(images_test, labels_test, q_cont) error_cont = 0.1000
decision_2normal.png
(see assignment_bayes.m
)
Fill the correct answers to your answers.txt
file.
4. Which of the conditions allowed us to express the optimal Bayesian strategy as a quadratic discriminative function (check all that apply):
5. If we extend the number of classes from 2 to 3 and keep the zero-one loss function, how does the optimal Bayesian strategy differ (check all that apply):
This task is not compulsory. Work on bonus tasks deepens on your knowledge about the subject. Successful solution of a bonus task(s) will be rewarded by 3 points.
The data for this task can be downloaded from data_33rpz_cv02_bonus.mat. Use distributions D1
, D2
and D3
. The task is analogous to the previous one, only there are now two measurements (features) and three classes. The second measurement is
y = (sum of pixel values in the upper half of image) -(sum of pixel values in the lower half of image)The alphabet (i.e. the classes) now consists of three letters, 'A', 'C', and 'T'.
The task is to classify the input images into one of the three classes. Use both measurements $x$ and $y$ here.
The density function $p_{X}(\mathbf{x}) = p_K(A)p_{X|k}(\mathbf{x}|A)+p_K(C)p_{X|k}(\mathbf{x}|C)+p_K(T)p_{X|k}(\mathbf{x}|T)$, where $\mathbf{x} = [x, y]$ is visualised below (drawn using pgmm
function from the toolbox)
Output:
Hints:
mnvpdf
or pdfgauss
function.