Search
By attending the lab and solving the assignment, a student should…
In this lab we will implement an OCR system. For simplicity we assume the texts contain only two letters: A and C, and that the letters are already well segmented into 10×10 pixel image chips. We will extract a simple feature from the image chips and use it in a bayesian decision framework to partition the image set. Furthermore, the relative frequency of occurrence of A and C in text is given. We will consider two cases: (1) the measurements are discrete, and (2) the measurements are continuous.
In the discrete probabilities case, we will find the optimal strategy, compute its Bayesian risk, experiment with the loss function and classify independent data samples by applying the theory from the lecture slides directly.
In the continuous measurements case, an assumption is made that the extracted feature for each letter class (A,C) is generated by a normal distribution. The 'zero-one' loss will be used to estimate risk, which simplifies the decision strategy.
General information for Python development.
To fulfil this assignment, you need to submit these files (all packed in one .zip file) into the upload system:
.zip
bayes.ipynb
bayes.py
find_strategy_discrete
bayes_risk_discrete
classify_discrete
classification_error
find_strategy_2normal
bayes_risk_2normal
classify_2normal
classif_W1.png
classif_W2.png
decision_discrete.png
thresholds.png
decision_2normal.png
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.
The task is to design a classifier (Bayesian strategy) $q(x)$, which distinguishes between 10×10 images of two letters (classes) A and C using only a single measurement (feature):
$$ q: \mathcal{X} \rightarrow D $$ $$ \mathcal{X} = \{-10, \ldots, 10\}, \quad D = K = \{A, C\}$$
The measurement $x$ is obtained from an image by computing
mu = -563.9 sigma = 2001.6 x_unnormalized = (sum of pixel values in the left half of image) -(sum of pixel values in the right half of image) x_norm = (x_unnormalized - mu) / (2 * sigma) * 10 x = x_norm limited to the interval <-10, 10> and discretized
Computation of this measurement is implemented in bayes.compute_measurement_lr_discrete.
bayes.compute_measurement_lr_discrete
The strategy $q(x)$ will be represented in numpy by a (21, ) vector containing 0 if the classification for that value of $x$ is supposed to be A, and 1 if the classification is supposed to be C. Thus given the vector $q$ and some $x$ we can decide for one of the classes.
As required in the formulation of Bayesian task, we are given all the necessary probabilities (specified in the bayes.ipynb):
discreteA['Prob']
discreteA['Prior']
discreteC['Prob']
discreteC['Prior']
images_test
labels_test
q_discrete = np.array([0]*10 + [1] + [0]*10) W = np.array([[0, 1], [1, 0]]) discreteA = {'Prior': 0.6153846153846154, 'Prob': np.array([0.0125, 0., 0., 0.0125, 0.025, 0.0125, 0.025, 0.0375, 0.075, 0.1, 0.2125, 0.1375, 0.15, 0.1, 0.0875, 0.0125, 0., 0., 0., 0., 0.])} discreteC = {'Prior': 0.38461538461538464, 'Prob': np.array([0., 0., 0., 0.02, 0.02, 0.22, 0.46, 0.16, 0.1, 0.02, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])} R_discrete = bayes_risk_discrete(discreteA, discreteC, W, q_discrete) # -> 0.5153846153846154
W = np.array([[0, 1], [1, 0]]) q_discrete = find_strategy_discrete(discreteA, discreteC, W) # -> array([0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) distribution1 = {} distribution2 = {} distribution1['Prior'] = 0.3 distribution2['Prior'] = 0.7 distribution1['Prob'] = np.array([0.2, 0.3, 0.4, 0.1]) distribution2['Prob'] = np.array([0.5, 0.4, 0.1, 0.0]) W = np.array([[0, 1], [1, 0]]) q = find_strategy_discrete(distribution1, distribution2, W) # -> array([1, 1, 0, 0])
visualise_discrete
W1=np.array(0, 1], [1, 0)
W2=np.array(0, 5], [1, 0)
W = np.array([[0, 1], [1, 0]]) q_discrete = find_strategy_discrete(discreteA, discreteC, W) measurements_discrete = compute_measurement_lr_discrete(images_test) labels_estimated_discrete = classify_discrete(measurements_discrete, q_discrete) error_discrete = classification_error(labels_estimated_discrete, labels_test) # -> 0.225
1. What is the sum of discreteA['prob'] array?
discreteA['prob']
Solution
The probabilities have to sum to 1. (Just like when you have a coin with P(head) = XYZ, P(tails) has to be (1 - XYZ).)
2. What is the relation between the final classification error (calculated by classification_error function) and the Bayesian risk (calculated by the bayes_risk_discrete function):
The correct answer is c). (Bayesian risk is an expectation of the cost over the whole distribution, but the classification error could have been computed on a particularly easy or a particularly hard sample).
In the second part of the assignment the probabilities $p_{X|k}(x|k)$ are given as continuous normal distributions (specified in the bayes.ipynb): $$p_{X|k}(x|A) = \frac{1}{\sigma_k\sqrt{2\pi}}e^{-\frac{(x-\mu_k)^2}{2\sigma_k^2}}$$
contA['Mean']
contA['Sigma']
contA['Prior']
contC['Mean']
contC['Sigma']
contC['Prior']
Example distribution:
contA = {'Mean': 124.2625, 'Sigma': 1434.45420083, 'Prior': 0.61538462} contC = {'Mean': -2010.98, 'Sigma': 558.42857106, 'Prior': 0.38461538}
W = np.array([ [0, 1], [1, 0] ])
In this particular case, as explained in lecture notes, slides 21-22, the optimal strategy $q$ can be found by solving the following problem:
$$q(x) = \arg\max_k p_{K|X}(k|x) = \arg\max_k p_{XK}(x,k)/p_X(x) = \arg\max_k p_{XK}(x,k) = \arg\max_k p_K(k)p_{X|K}(x|k),$$
where $p_{K|X}(k|x)$ is called posterior probability of class $k$ given the measurement $x$ (i.e. it is the probability of the data being from class $k$ if we measure feature value $x$). Symbol $\arg\max_k$ denotes finding $k$ maximising the argument.
We will also work with an unnormalised measurement here instead of the discrete one used above
x = ((sum of pixel values in the left half of image) -(sum of pixel values in the right half of image))
compute_measurement_lr_cont
At the beginning of the labs we will together show that in the case when $p_{X|k}(x|k)$ are normal distributions and when we consider zero-one loss function, the optimal Bayesian strategy $q(x)$ corresponds to a quadratic inequality (i.e. $ax^2+bx+c \ge 0$).
Hint: $$ \arg\max_{k\in\{A,C\}} p_K(k)p_{X|k}(x|k) \quad \mbox{translates into} \quad p_K(A)p_{X|k}(x|A) \ge p_K(C)p_{X|k}(x|C) $$
Optimal strategy representation: Solving the above quadratic inequality gives up to two thresholds $t_1$ and $t_2$ (zero crossing points) and a classification decision (A or C) in each of the intervals $\mathcal{X}_1 = (-\infty, t_1\rangle$, $\mathcal{X}_2 = (t_1, t_2\rangle$, $\mathcal{X}_3 = (t_2, \infty)$. So similarly to the discrete case we can represent the strategy in python as a dict with the following fields:
q['t1']
q['t2']
q['decision']
q_cont = find_strategy_2normal(contA, contC) # -> {'t2': -1248.7684903033442, 't1': -3535.997150276402, 'decision': array([0, 1, 0], dtype=int32)}
visualize_2norm
scipy.stats.norm.cdf
R_cont = bayes_risk_2normal(contA, contC, q_cont) # -> 0.13519281686757106
measurements_cont = compute_measurement_lr_cont(images_test) labels_estimated_cont = classify_2normal(measurements_cont, q_cont) error_cont = classification_error(labels_estimated_cont, labels_test) # -> 0.1
montage
1. Which of the conditions allowed us to express the optimal Bayesian strategy as a quadratic discriminative function (check all that apply):
The correct answers are a), c), and d).
a) Yes. It works for other distributions, but not all of them. b) No. It is enough that they are known. c) Yes. d) Yes. That allows us to transform the argmax into a single inequality. e) No. f) No. Different misclassification costs for different classes would change the derivation a bit, but the result would still be a quadratic inequality.
2. If we extend the number of classes from 2 to 3 and keep the zero-one loss function, check all that applies in this situation:
The correct answer is c).
a) No. The measurement dimensionality is independent from the number of classes. b) No. Even cubic function would still lead to only two outcomes (>= 0, < 0). c) Yes. This is always true in Bayesian classification. d) No. We just compute the risk for all classes and pick the class with the lowest risk. Doable for any number of classes.
This task is not compulsory. Work on bonus tasks deepens your knowledge about the subject. Successful solution of a bonus task(s) will be rewarded by up to 4 points.
The data for this task is located in data_33rpz_bayes_bonus.npz. Use the following distributions D1, D2 and D3:
data_33rpz_bayes_bonus.npz
D1
D2
D3
D1 = {'Mean': np.array([151.61, 1154.01]), 'Prior': 0.4166666666666667, 'Cov': np.array([[2048960.78575758, 169829.10494949], [ 169829.10494949, 482516.97969697]])} D2 = {'Mean': np.array([-2055.76666667, 54.3]), 'Prior': 0.25, 'Cov': np.array([[307539.97853107, -2813.69830508], [ -2813.69830508, 485574.58644068]])} D3 = {'Mean': np.array([ 492.7125, -2353.1125]), 'Prior': 0.3333333333333333, 'Cov': np.array([[968935.42262658, 12248.03053797], [ 12248.03053797, 407017.16439873]])}
y = (sum of pixel values in the upper half of image) -(sum of pixel values in the lower half of image)
The task is to classify the input images into one of the three classes. Use both measurements $x$ and $y$ here. The density function $p_{X}(\mathbf{x}) = p_K(A)p_{X|k}(\mathbf{x}|A)+p_K(C)p_{X|k}(\mathbf{x}|C)+p_K(T)p_{X|k}(\mathbf{x}|T)$, where $\mathbf{x} = [x, y]$ is visualised below.
Output:
Hints: