Warning
This page is located in archive. Go to the latest version of this course pages.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:30]
xposik [Specifications]
courses:be5b33prg:homeworks:spam:step2 [2015/12/04 14:22]
svobodat [Specifications]
Line 8: Line 8:
     * what these abbreviations mean for the spam filtering problem, and     * what these abbreviations mean for the spam filtering problem, and
     * what we need to know to be able to compute them.     * what we need to know to be able to compute them.
 +  * See the documentation for ''​[[https://​docs.python.org/​3.4/​library/​collections.html#​collections.namedtuple|namedtuple]]''​.
  
 ===== Specifications ===== ===== Specifications =====
Line 21: Line 22:
 from collections import namedtuple from collections import namedtuple
  
-ConfMat = namedtuple('​ConfMat',​ 'tptn fp fn')+ConfMat = namedtuple('​ConfMat',​ 'tp tn fp fn')
 </​code>​ </​code>​
  
 Why do we need it? Why do we need it?
   * Function ''​compute_confusion_matrix()''​ represents the basis for evaluation of the filter performance.   * Function ''​compute_confusion_matrix()''​ represents the basis for evaluation of the filter performance.
-  * The function can be used in the following way:<code python>+ 
 +The function can be used in the following way. First, an example where both the input dictionaries are empty, i.e. we have no information about any email. 
 +<code python>
 >>>​ cm1 = compute_confusion_matrix({},​ {}) >>>​ cm1 = compute_confusion_matrix({},​ {})
 >>>​ print(cm1) >>>​ print(cm1)
 ConfMat(tp=0,​ tn=0, fp=0, fn=0) ConfMat(tp=0,​ tn=0, fp=0, fn=0)
-</​code>​or<code python>+</​code>​ 
 + 
 +In the following code, each of TP, TN, FP, FN cases happens exactly once: 
 + 
 +<code python>
 >>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} >>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​}
 >>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​OK',​ '​em3':​ '​OK',​ '​em4':'​SPAM'​} >>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​OK',​ '​em3':​ '​OK',​ '​em4':'​SPAM'​}
Line 38: Line 45:
 </​code>​ </​code>​
  
-**Note**: You can expect ​that the dictionaries ​will have the same set of keysThink about the situation when the keys would be different: what shall the method do?+And in the last example, the predictions perfectly match the real classes, such that only TP and TN are nonzero: 
 + 
 +<code python>​ 
 +>>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} 
 +>>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} 
 +>>>​ cm2 = compute_confusion_matrix(truth_dict,​ pred_dict, pos_tag='​SPAM',​ neg_tag='​OK'​) 
 +>>>​ print(cm2) 
 +ConfMat(tp=2,​ tn=2, fp=0, fn=0) 
 +</​code>​ 
 + 
 +Of course, ​the input dictionaries ​may have a different number ​of items than 4. 
 + 
  
 >​{{page>​courses:​be5b33prg:​internal:​homeworks:​spam:​step2#​compute_confusion_matrix&​editbtn}} >​{{page>​courses:​be5b33prg:​internal:​homeworks:​spam:​step2#​compute_confusion_matrix&​editbtn}}
  
  
courses/be5b33prg/homeworks/spam/step2.txt · Last modified: 2015/12/04 14:22 by svobodat