Correction to lecture 12 in SS 2022/2023

During the lecture on Bayesian decision making, we solved the following task:

Your physician calls you: “I am sorry, your HIV test is positive, 999/1000 you will die within 10 years.”
What the doctor really meant by 999/1000: The used HIV test is falsely positive in 1 case out of 1000 positive cases.
Assume a prior probability of HIV to be 1/10000.

Chosen notation:

Positive vs negative HIV test result: $\oplus$ vs $\ominus$
Healthy vs not healthy (infected) person: $H$ vs $\bar H$

What went wrong

First (time 16:57), I wrongly interpretted the 999/1000 as $P(\bar H | \oplus) = \frac{999}{1000}$, i.e. probability of not being healthy given the test was positive. In fact, this is one of the probabilities that we are interested in (we want to compute them), and if it was 0.999, then there is indeed only a very small chance for us to survive. But the physician actually meant something else.

Second, after a few moments (time 18:50) I made a “correction” on the blackboard and denoted that probability as $P(\oplus | \bar H) = \frac{999}{1000}$, i.e. probability of having a positive test, if you are actually infected (not healthy). But that does not correspond to the column for $\bar H$ on the blackboard. There, it is actually assumed that the test is perfect for infected people, i.e., the probability that the test will be positive given you are not healthy is 1: $P(\oplus | \bar H) = 1$. The used test makes errors in the group of healthy people.

The sentence “your HIV test is positive; 999/1000 you will die…” is actually very unclear. I overlooked that the task itself provides clarification that it shall be interpreted as “the test is falsely positive in 1 of 1000 positive cases, i.e., probability of positive result given you are healthy $P(\oplus | H) = \frac{1}{1000}$.

When I then computed $P(\bar H | \oplus)$ using Bayes theorem, I used the wrong conditional probability $P(\oplus | \bar H)$. The correct computation using Bayes rule shall be as follows:

$$P(\bar H | \oplus) = \frac{P(\oplus | \bar H) P(\bar H)}{P(\oplus)} = \frac{P(\oplus | \bar H) P(\bar H)}{P(\oplus | \bar H) P(\bar H) + P(\oplus | H) P(H)}$$

After substitution of numeric values:

$$ P(\bar H | \oplus) = \frac{1 \cdot \frac{1}{10000}}{1 \cdot \frac{1}{10000} + \frac{1}{1000}\cdot\frac{9999}{10000}} \doteq \frac{1}{1+10} = \frac{1}{11} $$

And this is also the result that we got from the frequency table. The conclusion is that despite the high-quality test, the probability of being infected is still quite low if the disease itself is rare. The prior probability matters a lot!