Search
See the general homework guidelines!
Most of all, we appreciate your effort. You should not be afraid of failing the course because of this assignment. It is considered done as long as you satisfy very loose requirements.
The due dates and deadlines are specified in the upload system.
Breakdown of the spam filter evaluation:
compute_quality_for_corpus
Filter quality is going to be determined according to the following formula:
$ q = \frac{TP + TN}{TP + TN + 10 \cdot FP + FN}$ .
Positive cases (P) are emails classified as spam by the filter, negative cases (N) are emails classified as normal email messages by the filter. FP is thus the number of emails incorrectly flagged as spam, FN is the number of spam emails getting through without being flagged. It is important to note, that the TP, FP, TN, FN are frequencies (number of cases), not percentages.
Your filter will be evaluated on 3 different data sets (one of them is available to you at these web pages). For each dataset, your filter may get 0 to 3 points: