Create 3 simple non-adaptive filters, paranoid, naive, and random, and evaluate their quality.
Required features of Python:
You should think about and write down on a piece of paper:
Optional (for more advanced programmers): Read how the inheritance of OOP works in Python. You can find more information here:
Tasks:
simplefilters.py
, create 3 classes representing 3 simple filters:
NaiveFilter
which classifies all the emails as OK
,
ParanoidFilter
which classifies all the emails as SPAM
, and
RandomFilter
which assigns the lables OK
and SPAM
randomly.
BaseFilter
in module basefilter.py
.
Why do we need it?
To facilitate later automatic testing of the final filter, we require your filter to be named MyFilter
and defined in module filter.py
. In this step, however, you shall create 3 classes called NaiveFilter
, ParanoidFilter
, and RandomFilter
placed in module named simplefilters.py
.
A filter will be represented by a class with at least 2 public methods: train()
and test()
. Filters unable to learn from data will probably have the method train()
empty. The rest of the class structure is up to you.
Methods train()
:
Inputs | A path to training corpus, i.e. to a directory with emails, containing also the !truth.txt file. (Irrelevant for the simple filters.) |
---|---|
Outputs | None. |
Effects | Setup of the inner data structures of the filter, so that they can be later used to classify emails using the test() method. |
Method test()
:
Inputs | A path to a corpus to be evaluated. (The directory will not contain the !truth.txt file.) |
---|---|
Outputs | None. |
Effects | Creates the !prediction.txt file containing the predictions of the filter. |
Create a simple script that computes the quality of a specified filter. The script shall:
train()
on the first dataset,
test()
on the second dataset,
compute_quality_for_corpus()
for the second corpus,
!prediction.txt
from the corpus.