Warning
This page is located in archive. Go to the latest version of this course pages.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:00]
xposik [Spam filter - step 2]
courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:29]
xposik [Preparation]
Line 8: Line 8:
     * what these abbreviations mean for the spam filtering problem, and     * what these abbreviations mean for the spam filtering problem, and
     * what we need to know to be able to compute them.     * what we need to know to be able to compute them.
 +
 +===== Specifications =====
 +
 +Task:
 +  * In module ''​quality.py'',​ create function ''​compute_confusion_matrix()''​.
 +  * The function will have 4 input arguments:
 +    * ''​truth_dict'',​ a dictionary with the true correct class of individual emails,
 +    * ''​pred_dict'',​ a dictionary with the class predicted for individual emails by a filter,
 +    * ''​pos_tag''​ (optional, with default value ''​True''​),​ a class that will be considered positive, and
 +    * ''​neg_tag''​ (optional, with defualt value ''​False''​),​ a class that will be considered negative. Thanks to these optional parameters, the function will be generally usable, not only for the spam filter task with ''​pos_tag="​SPAM"''​ and ''​neg_tag="​OK"''​).
 +  * The function will compute four statistics, TP, TN, FP, FN, needed to evaluate a filter, and will return them as a ''​namedtuple''​ with the following definition:<​code python>
 +from collections import namedtuple
 +
 +ConfMat = namedtuple('​ConfMat',​ 'tp, tn fp fn')
 +</​code>​
 +
 +Why do we need it?
 +  * Function ''​compute_confusion_matrix()''​ represents the basis for evaluation of the filter performance.
 +  * The function can be used in the following way:<​code python>
 +    >>>​ cm1 = compute_confusion_matrix({},​ {})
 +    >>>​ print(cm1)
 +    ConfMat(tp=0,​ tn=0, fp=0, fn=0)
 +</​code>​or<​code python>
 +    >>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​}
 +    >>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​OK',​ '​em3':​ '​OK',​ '​em4':'​SPAM'​}
 +    >>>​ cm2 = compute_confusion_matrix(truth_dict,​ pred_dict, pos_tag='​SPAM',​ neg_tag='​OK'​)
 +    >>>​ print(cm2)
 +    ConfMat(tp=1,​ tn=1, fp=1, fn=1)
 +</​code>​
 +
 +**Note**: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
 +
 +>​{{page>​courses:​be5b33prg:​internal:​homeworks:​spam:​step2#​compute_confusion_matrix&​editbtn}}
 +
 +
courses/be5b33prg/homeworks/spam/step2.txt · Last modified: 2015/12/04 14:22 by svobodat