Create 3 simple non-adaptive filters, paranoid, naive, and random, and evaluate their quality.
Required features of Python:
You should think about and write down on a piece of paper:
Optional (for more advanced programmers): Read how the inheritance of OOP works in Python. You can find more information here:
Tasks:
simplefilters.py, create 3 classes representing 3 simple filters:
NaiveFilter which classifies all the emails as OK,
ParanoidFilter which classifies all the emails as SPAM, and
RandomFilter which assigns the lables OK and SPAM randomly.
BaseFilter in module basefilter.py.
Why do we need it?
To facilitate later automatic testing of the final filter, we require your filter to be named MyFilter and defined in module filter.py. In this step, however, you shall create 3 classes called NaiveFilter, ParanoidFilter, and RandomFilter placed in module named simplefilters.py.
A filter will be represented by a class with at least 2 public methods: train() and test(). Filters unable to learn from data will probably have the method train() empty. The rest of the class structure is up to you.
Methods train():
| Inputs | A path to training corpus, i.e. to a directory with emails, containing also the !truth.txt file. (Irrelevant for the simple filters.) |
|---|---|
| Outputs | None. |
| Effects | Setup of the inner data structures of the filter, so that they can be later used to classify emails using the test() method. |
Method test():
| Inputs | A path to a corpus to be evaluated. (The directory will not contain the !truth.txt file.) |
|---|---|
| Outputs | None. |
| Effects | Creates the !prediction.txt file containing the predictions of the filter. |
Create a simple script that computes the quality of a specified filter. The script shall:
train() on the first dataset,
test() on the second dataset,
compute_quality_for_corpus() for the second corpus,
!prediction.txt from the corpus.