Differences

This shows you the differences between two versions of the page.

--- courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:30]
xposik [Specifications]
+++ courses:be5b33prg:homeworks:spam:step2 [2015/12/04 14:22]
svobodat [Specifications]
@@ Line 8: / Line 8: @@
     * what these abbreviations mean for the spam filtering problem, and
     * what we need to know to be able to compute them.
+  * See the documentation for ''[[https://docs.python.org/3.4/library/collections.html#collections.namedtuple|namedtuple]]''.
 ===== Specifications =====
@@ Line 21: / Line 22: @@
 from collections import namedtuple
-ConfMat = namedtuple('ConfMat', 'tp, tn fp fn')
+ConfMat = namedtuple('ConfMat', 'tp tn fp fn')
 </code>
 Why do we need it?
   * Function ''compute_confusion_matrix()'' represents the basis for evaluation of the filter performance.
-  * The function can be used in the following way:<code python>
+The function can be used in the following way. First, an example where both the input dictionaries are empty, i.e. we have no information about any email.
+<code python>
 >>> cm1 = compute_confusion_matrix({}, {})
 >>> print(cm1)
 ConfMat(tp=0, tn=0, fp=0, fn=0)
-</code>or<code python>
+</code>
+In the following code, each of TP, TN, FP, FN cases happens exactly once:
+<code python>
 >>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
 >>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'}
@@ Line 38: / Line 45: @@
 </code>
-**Note**: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
+And in the last example, the predictions perfectly match the real classes, such that only TP and TN are nonzero:
+<code python>
+>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+>>> pred_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+>>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
+>>> print(cm2)
+ConfMat(tp=2, tn=2, fp=0, fn=0)
+</code>
+Of course, the input dictionaries may have a different number of items than 4.
 >{{page>courses:be5b33prg:internal:homeworks:spam:step2#compute_confusion_matrix&editbtn}}