Table of Contents

Spam filter - step 4

Create 3 simple non-adaptive filters, paranoid, naive, and random, and evaluate their quality.

Preparation

Required features of Python:

You should think about and write down on a piece of paper:

Optional (for more advanced programmers): Read how the inheritance of OOP works in Python. You can find more information here:

Simple filters

Tasks:

Why do we need it?

Specifications

To facilitate later automatic testing of the final filter, we require your filter to be named MyFilter and defined in module filter.py. In this step, however, you shall create 3 classes called NaiveFilter, ParanoidFilter, and RandomFilter placed in module named simplefilters.py.

A filter will be represented by a class with at least 2 public methods: train() and test(). Filters unable to learn from data will probably have the method train() empty. The rest of the class structure is up to you.

Methods train():

Inputs A path to training corpus, i.e. to a directory with emails, containing also the !truth.txt file. (Irrelevant for the simple filters.)
Outputs None.
Effects Setup of the inner data structures of the filter, so that they can be later used to classify emails using the test() method.

Method test():

Inputs A path to a corpus to be evaluated. (The directory will not contain the !truth.txt file.)
Outputs None.
Effects Creates the !prediction.txt file containing the predictions of the filter.

Evaluating the quality of simple filters

Create a simple script that computes the quality of a specified filter. The script shall: