Differences

This shows you the differences between two versions of the page.

--- courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:00]
xposik [Spam filter - step 2]
+++ courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:29]
xposik [Preparation]
@@ Line 8: / Line 8: @@
     * what these abbreviations mean for the spam filtering problem, and
     * what we need to know to be able to compute them.
+===== Specifications =====
+Task:
+  * In module ''quality.py'', create function ''compute_confusion_matrix()''.
+  * The function will have 4 input arguments:
+    * ''truth_dict'', a dictionary with the true correct class of individual emails,
+    * ''pred_dict'', a dictionary with the class predicted for individual emails by a filter,
+    * ''pos_tag'' (optional, with default value ''True''), a class that will be considered positive, and
+    * ''neg_tag'' (optional, with defualt value ''False''), a class that will be considered negative. Thanks to these optional parameters, the function will be generally usable, not only for the spam filter task with ''pos_tag="SPAM"'' and ''neg_tag="OK"'').
+  * The function will compute four statistics, TP, TN, FP, FN, needed to evaluate a filter, and will return them as a ''namedtuple'' with the following definition:<code python>
+from collections import namedtuple
+ConfMat = namedtuple('ConfMat', 'tp, tn fp fn')
+</code>
+Why do we need it?
+  * Function ''compute_confusion_matrix()'' represents the basis for evaluation of the filter performance.
+  * The function can be used in the following way:<code python>
+    >>> cm1 = compute_confusion_matrix({}, {})
+    >>> print(cm1)
+    ConfMat(tp=0, tn=0, fp=0, fn=0)
+</code>or<code python>
+    >>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
+    >>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'}
+    >>> cm2 = compute_confusion_matrix(truth_dict, pred_dict, pos_tag='SPAM', neg_tag='OK')
+    >>> print(cm2)
+    ConfMat(tp=1, tn=1, fp=1, fn=1)
+</code>
+**Note**: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
+>{{page>courses:be5b33prg:internal:homeworks:spam:step2#compute_confusion_matrix&editbtn}}