Table of Contents

Spam filter - step 3

Create a set of classes and functions needed to evaluate the filter quality.

Tests for step 3: test3_quality.zip

Preparation

Confusion Matrix

Task:

Why do we need it?

The class shall have at least 3 public methods: as_dict(), update() and compute_from_dicts().

as_dict() Returns conf. matrix in the form of dictionary.
Input: Nothing.
Output: A dictionary with keys tp, tn, fp, fn and their values.
Effects: None.
update(truth, pred) Increase the value of one of the counters according to the values of truth and pred.
Input: The true and predicted class.
Output: None.
Effects: An increase of a single counter value TP, TN, FP, FN, or raise a ValueError.
compute_from_dicts(truth_dict, pred_dict) Compute the whole confusion matrix from true classes and predictions.
Input: Two dictionaries containing the true and predicted classes for individual emails.
Output: None.
Effects: The items of conf. matrix will be set to the numbers of observed TP, TN, FP, FN.

Note: You can expect that the dictionaries will have the same set of keys. Think about the situation when the keys would be different: what shall the method do?

Function ''quality_score()''

Task:

quality_score(tp, tn, fp, fn) Compute the quality score based on the confusion matrix.
Inputs A 4-tuple of values TP, TN, FP, FN.
Outputs A number between 0 and 1 showing the prediction quality measure.

Function ''compute_quality_for_corpus()''

Task:

Why to we need it?

compute_quality_for_corpus(corpus_dir) Compute the quality of predictions for given corpus.
Inputs A corpus directory evaluated by a filter (i.e. a directory containing !truth.txt and !prediction.txt files).
Outputs Quality of the filter as a number between 0 and 1.