Warning
This page is located in archive. Go to the latest version of this course pages.

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
courses:be5b33prg:homeworks:spam:step2 [2015/11/25 16:29]
xposik [Preparation]
courses:be5b33prg:homeworks:spam:step2 [2015/12/01 16:04]
xposik [Specifications]
Line 8: Line 8:
     * what these abbreviations mean for the spam filtering problem, and     * what these abbreviations mean for the spam filtering problem, and
     * what we need to know to be able to compute them.     * what we need to know to be able to compute them.
 +  * See the documentation for ''​[[https://​docs.python.org/​3.4/​library/​collections.html#​collections.namedtuple|namedtuple]]''​.
  
 ===== Specifications ===== ===== Specifications =====
Line 26: Line 27:
 Why do we need it? Why do we need it?
   * Function ''​compute_confusion_matrix()''​ represents the basis for evaluation of the filter performance.   * Function ''​compute_confusion_matrix()''​ represents the basis for evaluation of the filter performance.
-  * The function can be used in the following way:<code python>​ + 
-    >>>​ cm1 = compute_confusion_matrix({},​ {}) +The function can be used in the following way. First, an example where both the input dictionaries are empty, i.e. we have no information about any email. 
-    >>>​ print(cm1) +<code python>​ 
-    ConfMat(tp=0,​ tn=0, fp=0, fn=0+>>>​ cm1 = compute_confusion_matrix({},​ {}) 
-</​code>​or<​code python>​ +>>>​ print(cm1) 
-    >>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} +ConfMat(tp=0,​ tn=0, fp=0, fn=0)
-    >>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​OK',​ '​em3':​ '​OK',​ '​em4':'​SPAM'​} +
-    >>>​ cm2 = compute_confusion_matrix(truth_dict,​ pred_dict, pos_tag='​SPAM',​ neg_tag='​OK'​) +
-    >>>​ print(cm2) +
-    ConfMat(tp=1,​ tn=1, fp=1, fn=1)+
 </​code>​ </​code>​
  
-**Note**You can expect ​that the dictionaries ​will have the same set of keysThink about the situation when the keys would be different: what shall the method do?+In the following code, each of TP, TN, FP, FN cases happens exactly once: 
 + 
 +<code python>​ 
 +>>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} 
 +>>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​OK',​ '​em3':​ '​OK',​ '​em4':'​SPAM'​} 
 +>>>​ cm2 = compute_confusion_matrix(truth_dict,​ pred_dict, pos_tag='​SPAM',​ neg_tag='​OK'​) 
 +>>>​ print(cm2) 
 +ConfMat(tp=1,​ tn=1, fp=1, fn=1) 
 +</​code>​ 
 + 
 +And in the last example, the predictions perfectly match the real classes, such that only TP and TN are nonzero: 
 + 
 +<code python>​ 
 +>>>​ truth_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} 
 +>>>​ pred_dict = {'​em1':​ '​SPAM',​ '​em2':​ '​SPAM',​ '​em3':​ '​OK',​ '​em4':'​OK'​} 
 +>>>​ cm2 = compute_confusion_matrix(truth_dict,​ pred_dict, pos_tag='​SPAM',​ neg_tag='​OK'​) 
 +>>>​ print(cm2) 
 +ConfMat(tp=2,​ tn=2, fp=0, fn=0) 
 +</​code>​ 
 + 
 +Of course, ​the input dictionaries ​may have a different number ​of items than 4. 
 + 
  
 >​{{page>​courses:​be5b33prg:​internal:​homeworks:​spam:​step2#​compute_confusion_matrix&​editbtn}} >​{{page>​courses:​be5b33prg:​internal:​homeworks:​spam:​step2#​compute_confusion_matrix&​editbtn}}
  
  
courses/be5b33prg/homeworks/spam/step2.txt · Last modified: 2015/12/04 14:22 by svobodat