Evaluation

See the general homework guidelines!

Most of all, we appreciate your effort. You should not be afraid of failing the course because of this assignment. It is considered done as long as you satisfy very loose requirements.

The due dates and deadlines are specified in the upload system.

Breakdown of the spam filter evaluation:

Evaluation category	min	max	note
Submission 1			See the Upload system for the due date!!!
`compute_quality_for_corpus`	0	5	Function works correctly (yes/no).
Submission 2			See the Upload system for the due date!!!
Filter runs	0	2	After applying the filter to a dataset, filter provides predictions, does not issue any error,… By this we reward your ability to write a simple spam filter in Python.
A non-trivial filter	0	2	Reward for the endeavour to create a reasonable filter. Simple filters like “everything is spam”, “everything is OK”, or “decide randomly” will not get this reward. Filters using simple if-then rules are eligible for this reward.
A learning filter	0	2	(+ full rewards from preceding categories) Filter changes its model of world based on the characteristics of the training data set.
The code quality	0	6	Suitable names for variables and functions. Readability, understandability, comments. Use of classes and OOP.
Evaluation of filter quality	0	9	See below.
Presentation	0	2	5-minute presentation of your code
Total	0	28

Filter quality assessment

Filter quality is going to be determined according to the following formula:

<latex>

q = \frac{TP + TN}{TP + TN + 10 \cdot FP + FN}.

</latex>

Positive cases (P) are emails classified as spam by the filter, negative cases (N) are emails classified as normal email messages by the filter. FP is thus the number of emails incorrectly flagged as spam, FN is the number of spam emails getting through without being flagged. It is important to note, that the TP, FP, TN, FN are frequencies (number of cases), not percentages.

Your filter will be evaluated on 3 different data sets (one of them is available to you at these web pages). For each dataset, your filter may get 0 to 3 points:

q	pts
<0, 0.3)	0
<0.3, 0.5)	1
<0.5, 0.7)	2
<0.7, 0.9)	2.5
<0.9, 1>	3