Differences

This shows you the differences between two versions of the page.

--- courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:29]
xposik [Preparation]
+++ courses:be5b33prg:homeworks:spam:step2 [2015/12/01 16:04]
xposik [Specifications]
@@ Line 8: / Line 8: @@
     * what these abbreviations mean for the spam filtering problem, and
     * what we need to know to be able to compute them.
+  * See the documentation for ''[[https://docs.python.org/3.4/library/collections.html#collections.namedtuple|namedtuple]]''.
 ===== Specifications =====
@@ Line 26: / Line 27: @@
 Why do we need it?
   * Function ''compute_confusion_matrix()'' represents the basis for evaluation of the filter performance.
-  * The function can be used in the following way:<code python>
-    >>> cm1 = compute_confusion_matrix({}, {})
+The function can be used in the following way. First, an example where both the input dictionaries are empty, i.e. we have no information about any email.
-    >>> print(cm1)
+<code python>
-    ConfMat(tp=0, tn=0, fp=0, fn=0)
+>>> cm1 = compute_confusion_matrix({}, {})
-</code>or<code python>
+>>> print(cm1)
-    >>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+ConfMat(tp=0, tn=0, fp=0, fn=0)
-    >>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'}
-    >>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
-    >>> print(cm2)
-    ConfMat(tp=1, tn=1, fp=1, fn=1)
 </code>
-**Note**: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
+In the following code, each of TP, TN, FP, FN cases happens exactly once:
+<code python>
+>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+>>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'}
+>>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
+>>> print(cm2)
+ConfMat(tp=1, tn=1, fp=1, fn=1)
+</code>
+And in the last example, the predictions perfectly match the real classes, such that only TP and TN are nonzero:
+<code python>
+>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+>>> pred_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+>>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
+>>> print(cm2)
+ConfMat(tp=2, tn=2, fp=0, fn=0)
+</code>
+Of course, the input dictionaries may have a different number of items than 4.
 >{{page>courses:be5b33prg:internal:homeworks:spam:step2#compute_confusion_matrix&editbtn}}