CourseWare Wiki
Switch Term
Winter 2024 / 2025
Winter 2023 / 2024
Winter 2022 / 2023
Winter 2021 / 2022
Winter 2020 / 2021
Winter 2019 / 2020
Winter 2018 / 2019
Older
Search
Log In
old
courses
be5b33prg
homeworks
spam
step3
Warning
This page is located in archive. Go to the latest version of this
course pages
.
Differences
This shows you the differences between two versions of the page.
View differences:
Side by Side
Inline
Go
Link to this comparison view
Both sides previous revision
Previous revision
2015/12/14 14:24 xposik [Function ''quality_score()'']
2015/12/14 14:14 xposik [Function ''compute_quality_for_corpus()'']
2015/12/14 14:14 xposik [Function ''compute_quality_for_corpus()'']
2015/12/14 14:12 xposik [Function ''quality_score()'']
2015/11/25 16:34 xposik [Spam filter - step 3]
2015/11/25 16:01 xposik [Confusion Matrix]
2015/11/25 15:58 xposik [Preparation]
2015/10/15 15:12 xposik [Function ''compute_quality_for_corpus()'']
2015/10/15 15:05 xposik [Confusion Matrix]
2015/10/15 13:58 xposik
2015/10/15 13:58 xposik created
Go
Next revision
Previous revision
2015/12/14 14:24 xposik [Function ''quality_score()'']
2015/12/14 14:14 xposik [Function ''compute_quality_for_corpus()'']
2015/12/14 14:14 xposik [Function ''compute_quality_for_corpus()'']
2015/12/14 14:12 xposik [Function ''quality_score()'']
2015/11/25 16:34 xposik [Spam filter - step 3]
2015/11/25 16:01 xposik [Confusion Matrix]
2015/11/25 15:58 xposik [Preparation]
2015/10/15 15:12 xposik [Function ''compute_quality_for_corpus()'']
2015/10/15 15:05 xposik [Confusion Matrix]
2015/10/15 13:58 xposik
2015/10/15 13:58 xposik created
Go
Next revision
Both sides next revision
courses:be5b33prg:homeworks:spam:step3 [2015/11/25 15:58]
xposik
[Preparation]
courses:be5b33prg:homeworks:spam:step3 [2015/12/14 14:14]
xposik
[Function ''compute_quality_for_corpus()'']
Line 1:
Line 1:
====== Spam filter - step 3 ======
====== Spam filter - step 3 ======
-
Create
a set of classes and
functions needed to evaluate the filter quality.
+
Create
additional
functions needed to evaluate the filter quality.
-
/**
-
<WRAP round download>
-
[[.unit_testing|Tests]] for step 3: {{:courses:a4b99rph:cviceni:spam:test3_quality.zip|}}
-
</WRAP>
-
**/
-
=====Confusion Matrix=====
-
-
Task:
-
* In module ''confmat.py'', create class ''BinaryConfusionMatrix''.
-
* The class shall encapsulate four-tuple of statistics, TP, TN, FP, FN, needed to evaluate a filter.
-
* During the initialization, the class will take parameters ''pos_tag'' and ''neg_tag'', i.e. values that shall be considered positive and negative, respectively. (The class will then be generally usable, not only for the spam filter with values ''SPAM'' and ''OK'').
-
* After the instance creation, all four statistics shall be set to 0.
-
* The class shall have method ''as_dict()'' which returns the confusion matrix as a dictionary with items ''tp, tn, fp, fn''.
-
* The class shall have method ''update(truth, prediction)'' which increases the value of relevant counter (TP, TN, FP, FN) by 1 based on the comparison of the ''truth'' and ''prediction'' values with ''pos_tag'' and ''neg_tag''. Raises a ''ValueError'', if the value of ''truth'' or ''prediction'' is different from both ''pos_tag'' and ''neg_tag''.
-
* The class will have method ''compute_from_dicts(truth_dict, pred_dict)'' which computes the statistics TP, FP, TN, FN from two dictionaries: the first one shall contain the correct classification of emails, the second one shall contain the predictions of the filter.
-
-
Why do we need it?
-
* Class ''BinaryConfusionMatrix'' represents the basis for evaluation of the filter success.
-
* The class can be used in the following way:<code python>
-
>>> cm1 = BinaryConfusionMatrix(pos_tag='True', neg_tag='False')
-
>>> cm1.as_dict()
-
{'tp': 0, 'tn': 0, 'fp': 0, 'fn': 0}
-
>>> cm1.update('True', 'True')
-
>>> cm1.as_dict()
-
{'tp': 1, 'tn': 0, 'fp': 0, 'fn': 0}
-
</code>or<code python>
-
>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'}
-
>>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'}
-
>>> cm2 = BinaryConfusionMatrix(pos_tag='SPAM', neg_tag='OK')
-
>>> cm2.compute_from_dicts(truth_dict, pred_dict)
-
>>> cm2.as_dict()
-
{'tp': 1, 'tn': 1, 'fp': 1, 'fn': 1}
-
</code>
-
-
The class shall have at least 3 public methods: ''as_dict()'', ''update()'' and ''compute_from_dicts()''.
-
^ as_dict() ^ Returns conf. matrix in the form of dictionary. ^
-
^ Input: | Nothing. |
-
^ Output: | A dictionary with keys ''tp, tn, fp, fn'' and their values. |
-
^ Effects: | None. |
-
-
^ update(truth, pred) ^ Increase the value of one of the counters according to the values of ''truth'' and ''pred''. ^
-
^ Input: | The true and predicted class. |
-
^ Output: | None. |
-
^ Effects: | An increase of a single counter value TP, TN, FP, FN, or raise a ''ValueError''. |
-
-
^ compute_from_dicts(truth_dict, pred_dict) ^ Compute the whole confusion matrix from true classes and predictions. ^
-
^ Input: | Two dictionaries containing the true and predicted classes for individual emails. |
-
^ Output: | None. |
-
^ Effects: | The items of conf. matrix will be set to the numbers of observed TP, TN, FP, FN. |
-
-
**Note**: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
-
-
>{{page>courses:a4b99rph:internal:cviceni:spam:tyden08#binaryconfusionmatrix&editbtn}}
Line 64:
Line 11:
Task:
Task:
* Create function ''quality_score(tp, tn, fp, fn)'' in module ''quality.py''.
* Create function ''quality_score(tp, tn, fp, fn)'' in module ''quality.py''.
-
* Function computes the quality score defined during the lab.
+
* Function computes the quality score defined during the lab
(find it also [[courses:be5b33prg:homeworks:spam:evaluation#filter_quality_assessment|here]])
.
^ ''quality_score(tp, tn, fp, fn) '' Compute the quality score based on the confusion matrix. ^^
^ ''quality_score(tp, tn, fp, fn) '' Compute the quality score based on the confusion matrix. ^^
Line 76:
Line 23:
* In module ''quality.py'', create function ''compute_quality_for_corpus(corpus_dir)'' which evaluates the filter quality based on the information contained in files ''!truth.txt'' and ''!prediction.txt'' in the given corpus.
* In module ''quality.py'', create function ''compute_quality_for_corpus(corpus_dir)'' which evaluates the filter quality based on the information contained in files ''!truth.txt'' and ''!prediction.txt'' in the given corpus.
* The true and predicted classification can be read in the form of dictionaries using function ''read_classification_from_file()''.
* The true and predicted classification can be read in the form of dictionaries using function ''read_classification_from_file()''.
-
* The confusion matrix for the given corpus can be computed from the dictionaries using method ''
compute_from_dicts
()''
of ''BinaryConfusionMatrix'' class
.
+
* The confusion matrix for the given corpus can be computed from the dictionaries using method ''
compute_confusion_matrix
()''
function from step 2
.
-
* The quality score can be computed from the confusion matrix using function ''quality_score()''.
+
* The quality score can be computed from the confusion matrix using function ''quality_score()''
from step 2
.
Why do we need it?
Why do we need it?
courses/be5b33prg/homeworks/spam/step3.txt
· Last modified: 2015/12/14 14:24 by
xposik