Search
Create function compute_confusion_matrix() that will compute and return a confusion matrix based on real classes of emails, and on email classes predicted by a filter.
compute_confusion_matrix()
namedtuple
Task:
quality.py
truth_dict
pred_dict
pos_tag
True
neg_tag
False
pos_tag=“SPAM”
neg_tag=“OK”
from collections import namedtuple ConfMat = namedtuple('ConfMat', 'tp tn fp fn')
Why do we need it?
The function can be used in the following way. First, an example where both the input dictionaries are empty, i.e. we have no information about any email.
>>> cm1 = compute_confusion_matrix({}, {}) >>> print(cm1) ConfMat(tp=0, tn=0, fp=0, fn=0)
In the following code, each of TP, TN, FP, FN cases happens exactly once:
>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'} >>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'} >>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK') >>> print(cm2) ConfMat(tp=1, tn=1, fp=1, fn=1)
And in the last example, the predictions perfectly match the real classes, such that only TP and TN are nonzero:
>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'} >>> pred_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'} >>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK') >>> print(cm2) ConfMat(tp=2, tn=2, fp=0, fn=0)
Of course, the input dictionaries may have a different number of items than 4.