Create a set of classes and functions needed to evaluate the filter quality.
Tests for step 3: test3_quality.zip
Task:
confmat.py
, create class BinaryConfusionMatrix
.
pos_tag
and neg_tag
, i.e. values that shall be considered positive and negative, respectively. (The class will then be generally usable, not only for the spam filter with values SPAM
and OK
).
as_dict()
which returns the confusion matrix as a dictionary with items tp, tn, fp, fn
.
update(truth, prediction)
which increases the value of relevant counter (TP, TN, FP, FN) by 1 based on the comparison of the truth
and prediction
values with pos_tag
and neg_tag
. Raises a ValueError
, if the value of truth
or prediction
is different from both pos_tag
and neg_tag
.
compute_from_dicts(truth_dict, pred_dict)
which computes the statistics TP, FP, TN, FN from two dictionaries: the first one shall contain the correct classification of emails, the second one shall contain the predictions of the filter.
Why do we need it?
BinaryConfusionMatrix
represents the basis for evaluation of the filter success.
>>> cm1 = BinaryConfusionMatrix(pos_tag='True', neg_tag='False') >>> cm1.as_dict() {'tp': 0, 'tn': 0, 'fp': 0, 'fn': 0} >>> cm1.update('True', 'True') >>> cm1.as_dict() {'tp': 1, 'tn': 0, 'fp': 0, 'fn': 0}
>>> truth_dict = {'em1': 'SPAM', 'em2': 'SPAM', 'em3': 'OK', 'em4':'OK'} >>> pred_dict = {'em1': 'SPAM', 'em2': 'OK', 'em3': 'OK', 'em4':'SPAM'} >>> cm2 = BinaryConfusionMatrix(pos_tag='SPAM', neg_tag='OK') >>> cm2.compute_from_dicts(truth_dict, pred_dict) >>> cm2.as_dict() {'tp': 1, 'tn': 1, 'fp': 1, 'fn': 1}
The class shall have at least 3 public methods: as_dict()
, update()
and compute_from_dicts()
.
as_dict() | Returns conf. matrix in the form of dictionary. |
---|---|
Input: | Nothing. |
Output: | A dictionary with keys tp, tn, fp, fn and their values. |
Effects: | None. |
update(truth, pred) | Increase the value of one of the counters according to the values of truth and pred . |
---|---|
Input: | The true and predicted class. |
Output: | None. |
Effects: | An increase of a single counter value TP, TN, FP, FN, or raise a ValueError . |
compute_from_dicts(truth_dict, pred_dict) | Compute the whole confusion matrix from true classes and predictions. |
---|---|
Input: | Two dictionaries containing the true and predicted classes for individual emails. |
Output: | None. |
Effects: | The items of conf. matrix will be set to the numbers of observed TP, TN, FP, FN. |
Note: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?
Task:
quality_score(tp, tn, fp, fn)
in module quality.py
.
quality_score(tp, tn, fp, fn) Compute the quality score based on the confusion matrix. |
|
---|---|
Inputs | A 4-tuple of values TP, TN, FP, FN. |
Outputs | A number between 0 and 1 showing the prediction quality measure. |
Task:
quality.py
, create function compute_quality_for_corpus(corpus_dir)
which evaluates the filter quality based on the information contained in files !truth.txt
and !prediction.txt
in the given corpus.
read_classification_from_file()
.
compute_from_dicts()
of BinaryConfusionMatrix
class.
quality_score()
.
Why to we need it?
compute_quality_for_corpus(corpus_dir) | Compute the quality of predictions for given corpus. |
---|---|
Inputs | A corpus directory evaluated by a filter (i.e. a directory containing !truth.txt and !prediction.txt files). |
Outputs | Quality of the filter as a number between 0 and 1. |