Table of Contents

Spam filter - step 2

Create function compute_confusion_matrix() that will compute and return a confusion matrix based on real classes of emails, and on email classes predicted by a filter.

Preparation

Specifications

Task:

Why do we need it?

The function can be used in the following way. First, an example where both the input dictionaries are empty, i.e. we have no information about any email.

>>> cm1 = compute_confusion_matrix({}, {})
>>> print(cm1)
ConfMat(tp=0, tn=0, fp=0, fn=0)

In the following code, each of TP, TN, FP, FN cases happens exactly once:

>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
>>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'}
>>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
>>> print(cm2)
ConfMat(tp=1, tn=1, fp=1, fn=1)

And in the last example, the predictions perfectly match the real classes, such that only TP and TN are nonzero:

>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
>>> pred_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
>>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
>>> print(cm2)
ConfMat(tp=2, tn=2, fp=0, fn=0)

Of course, the input dictionaries may have a different number of items than 4.