====== The Minimax Task ======

The Bayesian formulation has some limits discussed at the {{:courses:be5b33rpz:lessons:p2rod-nonbayes.pdf|lecture}} and for certain situations one may need to formulate the problem e.g. as a Neyman-Pearson task, Wald task or a Minimax task. Out of this spectrum of problem formulations we will focus on the Minimax task here. You should be able to apply parts of the proposed solution to the other problem formulations on your own.

We will continue with our simple OCR motivation from the previous assignment. Lets consider a situation when we are trying to recognise licence plates. For simplicity we again stick to the simple two-letter alphabet and 10x10 already segmented letters. The situation in the Bayes task then corresponds to a problem of designing a recognition system for a particular crossroad where the a priori probability of the two letters is known or could be measured.

In today's assignment we aim at designing a recognition system which could be deployed //anywhere// in the Czech republic. The licence plates in different cities use different combination of the letters, so the a priori probabilities differ as well. Obviously, it is not practical to find the optimal Bayesian strategy for every camera position.

As you might have already guessed, the situation is ideal for the Minimax strategy. Here, the **a priori probability exists, but is not known**.

We will consider two cases: (a) discrete measurements, and (b) continuous measurements. For the first case we follow the likelihood ratio based search for the optimal strategy as suggested in the lecture. For the continuous case, the solution is usually found analytically or numerically. Instead, we deliberately take a small detour and show some interesting relations to the Bayes task while deriving the optimal strategy.

<WRAP center round info>

[[https://cw.fel.cvut.cz/wiki/courses/be5b33rpz/labs/python_development|General information for Python development]].

To fulfil this assignment, you need to submit these files (all packed in one ''.zip'' file) into the [[https://cw.felk.cvut.cz/brute/|upload system]]:
  * **''answers.txt''** - answers to the Assignment Questions
  * **''minimax.ipynb''** - a notebook for data initialisation, calling of the implemented functions and plotting of their results (for your convenience, will not be checked).
  * **''minimax.py''** - file with the following methods implemented:
    * **''minimax_strategy_discrete''** - function which finds the optimal Minimax strategy for discrete measurements
    * **''classify_discrete''** - function for using the discrete strategy for image classification
    * **''risk_fix_q_cont''** - function for computing risk for a fixed strategy but changing a priori probability
    * **''worst_risk_cont''** - worst case risk for a Bayesian strategy trained with a particular a priori probability but evaluated on all a priori probabilities
    * **''minmax_strategy_cont''** - function which finds the optimal Minimax strategy for continuous measurements
    * all **functions from the previous assignment (bayes)** - Unfortunately, including bayes.py and import from it is not supported. You have to copy your functions from bayes.py to minimax.py.
  * **''minimax_lr_search_discrete.png''**, **''minimax_strategy_discrete.png''**, **''minimax_classif_discrete.png''**, **''plots_cont.png''** and **''minimax_classif_cont.png''** - images specified in the tasks

** Use [[https://cw.fel.cvut.cz/wiki/courses/be5b33rpz/labs/python_development#Assignment Templates|template]] of the assignment.** When preparing a zip file for the upload system, **do not include any directories**, the files have to be in the zip file root. **Do not forget to ''git pull'' before starting the work**.
<note warning>It is not possible to import your previously implemented functions from ''bayes.py'' in BRUTE, please copy all the necessary functions directly into ''minimax.py''. Sorry for the inconvenience.</note>
</WRAP>

===== Discrete measurements =====

In this section we will be using $x\in X$ and $y\in Y$ measurements computed by ''compute_measurement_lr_discrete'' and ''compute_measurement_ul_discrete'' respectively. Each of them produces integer values in the range $[-10, 10]$.

The probability functions are stored in the ''discrete'' variable. For instance for letter "A", ''discrete['A']'' contains $p_{X,Y|k}(x,y|A)$ given as a (21, 21) numpy array (the size corresponds to the cardinality of $X \times Y$).

We will select only two letters from the alphabet and consider only the two-class problem here.

Recall from the lecture slides, that the Minimax task could be solved by finding a strategy corresponding to a single threshold $\mu$ on the likelihood ratio (LR) $r = \frac{p(x, y|1)}{p(x, y|2)}$. Since the measurements are discrete and their domain finite, we have only a limited number of LR values $r_i$, one for every $x,y$ combination. Let us assume that the indexing $i$ of discrete features $(x,y)$ corresponds to likelihood ratios sorted by ascent: $r_i \leq r_{i+1}$ (we can indeed start from any indexing of discrete pairs $(x,y)$, then apply the sorting to get the sorting permutation). Then any strategy based on thresholding of the likelihood ratio can be represented in the form 
\begin{align}\label{q1}
q(i) = \begin{cases}
 1\ \ & \text{if}\ \ \ i>i^*,\\
 2\ \ & \text{if}\ \ \ i\leq i^*;
\end{cases}
\end{align}
for some $i^* \in [-1,N)$, where $N=|X||Y|$. 

In the Minimax task we are looking for an index $i^*$, for which $\max(\epsilon_1(i), \epsilon_2(i))$ is minimal. The values $\epsilon_1$ and $\epsilon_2$ are defined by equation (28) in the lecture slides, where the strategy $q$ in the form \eqref{q1} is substituted.

This representation slightly differs from the lecture slides in that even for likelihood ratios that are the same, i.e. $r_i = r_{i+1}$, there still exist //distinct// strategies. The non-uniqueness of $r_i$ is a common case with histogram-based estimates of probabilities: many ratios are exactly $0$, $\infty$ or ${\tt nan}$ and equality of other values is not excluded. By using representation \eqref{q1} these cases will be automatically handled correctly.


==== Tasks ====
Select two letters of your choice (could be your initials, but everything should work on any couple of letters). Here we will assume your name was Chuck Norris, so we pick the letters "C" and "N".

  - Start by displaying the distributions to get an idea of the problem. \\ \\ {{ :courses:be5b33rpz:labs:03_minimax:minimax_distributions_discrete.png?nolink&600 |}}

  - Complete the template of the function ''minimax_strategy_discrete''. It should return the optimal Minimax strategy, optimal index $i^*$, and $\epsilon_1$ and $\epsilon_2$ arrays. Where the LR is $0/0 = \mbox{nan}$, prefer the first class (this is needed for automatic evaluation). \\ \\ **Hint:** ''np.sort'' and ''np.argsort'' handle $\mbox{nan}$ reasonably for the search. \\ \\

  - Plot $\epsilon_1$ and $\epsilon_2$ for varying index $i$ together with the found optimal index $i^*$. Use the provided function ''plot_lr_threshold''. Save the figure to **''minimax_lr_search_discrete.png''**. \\ \\ {{ :courses:be5b33rpz:labs:03_minimax:minimax_lr_search_discrete.png?nolink&800 |}}

  - Plot the found optimal strategy using ''plot_discrete_strategy'' and save the figure as **''minimax_strategy_discrete.png''**. \\ \\ {{ :courses:be5b33rpz:labs:03_minimax:minimax_strategy_discrete.png?nolink&400 |}}

  - Complete the function ''classify_discrete'' which takes a strategy and a set of images (from two classes) and produces the classification labels. \\ \\


  - Show the classification and save the figure as **''minimax_classif_discrete.png''**. \\ \\ {{ :courses:be5b33rpz:labs:03_minimax:minimax_classif_discrete.png?nolink&400 |}}


  - Complete the function ''classification_error_discrete'' which computes the error of a given strategy on a provided test images.


===== Continuous measurements =====

In this section we will be using a continuous measurement $x$ computed by ''compute_measurement_lr_cont''.

The class probability density functions are assumed to be normal distributions and they are for every letter specified in the ''cont[letter]'' variable. So, for instance for the letter "A", the mean is in ''cont['A']['Mean']'' and the standard deviation in ''cont['A']['Sigma']''.

Again, we will select only two letters (e.g. your initials) from the alphabet and consider only the two-class problem with labels 1 and 2.

Before finding the optimal Minimax strategy for the continuous case we will spend some time drawing various Bayes risks to demonstrate the relation of the Minimax and the Bayes problems.

==== Tasks ====
Step by step, we will build the following graph and then we find the optimal Minimax strategy based on our understanding of the relation between Minimax and Bayes. \\ \\
{{ :courses:be5b33rpz:labs:03_minimax:plots_cont.png?nolink&600 |}}
  - For $p_K(1) \in \{0, 0.01, \ldots, 0.99, 1\}$ compute and plot into one figure the following:
      - Risk of the Bayesian strategy optimal for each $p_K(1)$ (the blue curve).\\ \\ **Hint 1:** Use your code from the last labs.\\ \\ **Hint 2:** In the Minimax problem formulation, each error is penalised equally independent of the class which corresponds to the 0-1 loss function.\\ \\
      - Find the optimal strategy for $p_K(1) = 0.25$. Then for this, now fixed strategy, alter the a priori probability and compute its Bayesian risk (black line). For this complete the template function ''risk_fix_q_cont''.\\ \\ **Hint 1:** This is what happens, when we have assumed wrong a priori probabilities different from the real one. \\ **Hint 2:** Equation (3) from {{:courses:be5b33rpz:labs:03_minimax:minimax_task.pdf|[1]}} explains the shape of the observed curve.\\ \\
      - For each iterated $p_K(1)$ derive the optimal Bayesian strategy and compute its worst-case risk in case the true a priori probability is different (red curve). For this complete the template function ''worst_risk_cont''.\\ \\ **Hint:** You can again use the equation (3) from {{:courses:be5b33rpz:labs:03_minimax:minimax_task.pdf|[1]}} and the fact that the a priori probability is bounded to the <0,1> interval.\\ \\
  - Save the figure to **''plots_cont.png''**. \\ \\
  - Complete the template function ''minmax_strategy_cont'', so that it finds the Minimax strategy and its risk. \\ \\ **Hint:** The Minimax strategy is the Bayesian strategy with minimal maximal Bayesian risk over all $p_K(C)$.\\ \\ **Hint**: use ''scipy.optimize.fminbound''\\ \\ <code python>
D1 = cont['M']
D2 = cont['D']
q, risk = minmax_strategy_cont(D1, D2)

# q['t1'] -> -230.44601288127865
# q['t2'] -> 3705.0796592073866 
# q['decision'] -> array([1, 0, 1], dtype=int32)
# risk -> 0.2774251103401184
</code>
  - Plot Bayesian risk for all values of $p_K(1)$ for the minimax strategy.\\ \\
  - Generate a test set for your two letters using **''create_test_set''**. \\ \\
  - Compute the error of the strategy on the generated test images, show the final classification in one figure and save it as **''minmax_classif_cont.png''**. \\ \\ **Hint**: use the ''**classification_error_2normal**'' function from the Bayes assignment. \\ \\ <code python>
error_cont = classification_error_2normal( images_test_set, labels_test_set, q_minimax_cont)
# error_cont -> 0.3167
</code> \\ \\ {{ :courses:be5b33rpz:labs:03_minimax:minimax_classif_cont.png?nolink&600 |}}

===== Assignment Questions =====

Fill the correct answers to your ''answers.txt'' file.
<WRAP center round help>
  - What is the maximal risk of the optimal Bayesian strategy trained for $P_K(A)=0.3$ and $P_K(B)=0.7$ using continuous measurement when the a priori probabilities are allowed to change? Round it to two decimal points.
  - The relation between the worst case risk and the best case risk for the minimax strategy in the continuous case is
    * a) The worst case risk is always greater than the best case risk
    * b) The worst case risk could be smaller than the best case risk
    * c) The worst case risk is strictly equal to the best case risk
  - In the above we assumed the a priori probability exists but is not known. What would change if it didn't exist?
    * a) We can't compute the risk and have to use the likelihood ratio directly
    * b) We would need to use different loss function W
    * c) We would proceed the same way as we did above
</WRAP>

===== Bonus Task =====

**This task is not compulsory.** 

Go through the chapter on non-Bayesian tasks in SH10 book [2], especially the parts discussing solution of the minimax task by linear programming (pages 25, 31, 35-38, 40-41, 46-47). Solve the above classification task using linear programming as described on page 47.

**Hints:**
  * Work with the discrete measurements $x$.
  * Represent the sought strategy (classification function) as a table α, with α(i,k) corresponding to the probability of classification of bin 'i' to class 'k' such that α(i,k) >= 0 and sum_k α(i,k) = 1.
  * Reformulate the task in a matrix form according to equation 2.53 in SH10, page 47
  * Solve the task numerically using ''scipy.optimize.linprog''.
  * Compare the obtained results with the results of the classification above.

===== References =====

  * [1] {{:courses:be5b33rpz:labs:03_minimax:minimax_task.pdf| Minimax Task}} (short support text for labs)
  * [2] Michail I. Schlesinger, Vaclav Hlavac. Ten Lectures on Statistical and Structural Pattern Recognition. Kluwer Academic Publishers, 2002.
  * [3] [[http://cmp.felk.cvut.cz/cmp/courses/recognition/slides/non-bayesian_recognition/thumb.html|slides of prof. Matas]]