Search
By attending the lab and solving the assignment, a student should…
The Bayesian formulation has some limits discussed at the lecture and for certain situations one may need to formulate the problem e.g. as a Neyman-Pearson task, Wald task or a Minimax task. Out of this spectrum of problem formulations we will focus on the Minimax task here. You should be able to apply parts of the proposed solution to the other problem formulations on your own.
We will continue with our simple OCR motivation from the previous assignment. Lets consider a situation when we are trying to recognise licence plates. For simplicity we again stick to the simple two-letter alphabet and 10×10 already segmented letters. The situation in the Bayes task then corresponds to a problem of designing a recognition system for a particular crossroad where the a priori probability of the two letters is known or could be measured.
In today's assignment we aim at designing a recognition system which could be deployed anywhere in the Czech republic. The licence plates in different cities use different combination of the letters, so the a priori probabilities differ as well. Obviously, it is not practical to find the optimal Bayesian strategy for every camera position.
As you might have already guessed, the situation is ideal for the Minimax strategy. Here, the a priori probability exists, but is not known.
We will consider two cases: (a) discrete measurements, and (b) continuous measurements. For the first case we follow the likelihood ratio based search for the optimal strategy as suggested in the lecture. For the continuous case, the solution is usually found analytically or numerically. Instead, we deliberately take a small detour and show some interesting relations to the Bayes task while deriving the optimal strategy.
General information for Python development.
To fulfil this assignment, you need to submit these files (all packed in one .zip file) into the upload system:
.zip
minimax.ipynb
minimax.py
minimax_strategy_discrete
classify_discrete
risk_fix_q_cont
worst_risk_cont
minmax_strategy_cont
minimax_lr_search_discrete.png
minimax_strategy_discrete.png
minimax_classif_discrete.png
plots_cont.png
minimax_classif_cont.png
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root. Do not forget to git pull before starting the work.
git pull
bayes.py
In this section we will be using $x\in X$ and $y\in Y$ measurements computed by compute_measurement_lr_discrete and compute_measurement_ul_discrete respectively. Each of them produces integer values in the range $[-10, 10]$.
compute_measurement_lr_discrete
compute_measurement_ul_discrete
The probability functions are stored in the discrete variable. For instance for letter “A”, discrete['A'] contains $p_{X,Y|k}(x,y|A)$ given as a (21, 21) numpy array (the size corresponds to the cardinality of $X \times Y$).
discrete
discrete['A']
We will select only two letters from the alphabet and consider only the two-class problem here.
Recall from the lecture slides, that the Minimax task could be solved by finding a strategy corresponding to a single threshold $\mu$ on the likelihood ratio (LR) $r = \frac{p(x, y|1)}{p(x, y|2)}$. Since the measurements are discrete and their domain finite, we have only a limited number of LR values $r_i$, one for every $x,y$ combination. Let us assume that the indexing $i$ of discrete features $(x,y)$ corresponds to likelihood ratios sorted by ascent: $r_i \leq r_{i+1}$ (we can indeed start from any indexing of discrete pairs $(x,y)$, then apply the sorting to get the sorting permutation). Then any strategy based on thresholding of the likelihood ratio can be represented in the form \begin{align}\label{q1} q(i) = \begin{cases} 1\ \ & \text{if}\ \ \ i>i^*,\\ 2\ \ & \text{if}\ \ \ i\leq i^*; \end{cases} \end{align} for some $i^* \in [-1,N)$, where $N=|X||Y|$.
In the Minimax task we are looking for an index $i^*$, for which $\max(\epsilon_1(i), \epsilon_2(i))$ is minimal. The values $\epsilon_1$ and $\epsilon_2$ are defined by equation (28) in the lecture slides, where the strategy $q$ in the form \eqref{q1} is substituted.
This representation slightly differs from the lecture slides in that even for likelihood ratios that are the same, i.e. $r_i = r_{i+1}$, there still exist distinct strategies. The non-uniqueness of $r_i$ is a common case with histogram-based estimates of probabilities: many ratios are exactly $0$, $\infty$ or ${\tt nan}$ and equality of other values is not excluded. By using representation \eqref{q1} these cases will be automatically handled correctly.
Select two letters of your choice (could be your initials, but everything should work on any couple of letters). Here we will assume your name was Chuck Norris, so we pick the letters “C” and “N”.
np.sort
np.argsort
plot_lr_threshold
plot_discrete_strategy
In this section we will be using a continuous measurement $x$ computed by compute_measurement_lr_cont.
compute_measurement_lr_cont
The class probability density functions are assumed to be normal distributions and they are for every letter specified in the cont[letter] variable. So, for instance for the letter “A”, the mean is in cont['A']['Mean'] and the standard deviation in cont['A']['Sigma'].
cont[letter]
cont['A']['Mean']
cont['A']['Sigma']
Again, we will select only two letters (e.g. your initials) from the alphabet and consider only the two-class problem with labels 1 and 2.
Before finding the optimal Minimax strategy for the continuous case we will spend some time drawing various Bayes risks to demonstrate the relation of the Minimax and the Bayes problems.
Step by step, we will build the following graph and then we find the optimal Minimax strategy based on our understanding of the relation between Minimax and Bayes.
scipy.optimize.fminbound
def orig_function(w, x, y, z): return w * x * y * z w_val = 10 y_val = 11 z_val = 12 helper_function = lambda prior: orig_function(w_val, prior, y_val, z_val) ...results... = fminbound(helper_function, -42.0, 42.0, full_output=True) D1 = cont['M'] D2 = cont['D'] q, risk = minmax_strategy_cont(D1, D2) # q['t1'] -> -230.44601288127865 # q['t2'] -> 3705.0796592073866 # q['decision'] -> array([1, 0, 1], dtype=int32) # risk -> 0.2774251103401184
create_test_set
minmax_classif_cont.png
measurements_cont = compute_measurement_lr_cont(images_test_cont) labels_estimated_cont = classify_2normal(measurements_cont, q) error_cont = classification_error(labels_estimated_cont, labels_test_cont) # error_cont -> 0.3167
1. What is the maximal risk of the optimal Bayesian strategy trained for $P_K(A)=0.3$ and $P_K(B)=0.7$ using continuous measurement when the a priori probabilities are allowed to change? Round it to two decimal points.
Solution
0.41
2. The relation between the worst case risk and the best case risk for the minimax strategy in the continuous case is
The correct answer is c).
a) No. We could find a strategy, that would make the worst case risk smaller (by increasing the best case risk) b) No. Nonsense, the worst case risk is always worse than the best case one. c) Yes. (See explanation for a) and b) )
3. In the above we assumed the a priori probability exists but is not known. What would change if it didn't exist?
This task is not compulsory.
Go through the chapter on non-Bayesian tasks in SH10 book [2], especially the parts discussing solution of the minimax task by linear programming (pages 25, 31, 35-38, 40-41, 46-47). Solve the above classification task using linear programming as described on page 47.
Hints:
scipy.optimize.linprog