Search
See the general homework guidelines!
Your task is to create a class MyFilter in module (file) filter.py which
MyFilter
filter.py
MyFilter.train(train_corpus_dir)
MyFilter.test(test_corpus_dir)
!prediction.txt
train()
test()
A corpus can contain some special files. For us, those files always have names starting with ! (e.g. !truth.txt) and do not contain any email messages.
More detailed information in the following sections.
Class MyFilter shall be defined in a module called filter.py. The class will be used as follows:
from filter import MyFilter filter = MyFilter() filter.train('/path/to/training/corpus') # This folder will contain the !truth.txt file filter.test('/path/to/testing/corpus') # The method shall create the !prediction.txt file in this folder
Since the test() method shall be able to work without a prior call to the train() method, the following usage is also allowed (and must be supported):
from filter import MyFilter filter = MyFilter() filter.test('/path/to/testing/corpus') # The method shall create the !prediction.txt file in this folder
When computing the quality of your filter, we will always call the train() method before test().
!truth.txt
You shall hand in a ZIP achive with module quality.py and possibly with other modules needed by quality.py. These files shall be placed in the root of the archive, and the archive should not contain any folders. If you have followed the suggestions, your archive should probably contain files quality.py, utils.py, and maybe others.
quality.py
utils.py
Only the function compute_quality_for_corpus() (i.e. the solution of step 3) will be subject to testing in this phase. The goal of this submission is to ensure that you all have a function which correctly computes the quality of the filter.
compute_quality_for_corpus()
Hand in a ZIP archive with your filter and all other files it needs to run. These files should be in the root of the archive, the archive should not contain any subdirectories. If you followed our instructions, your archive should contain the following files:
basefilter.py
BaseFilter
read_classification_from_file
write_classification_to_file
Do not hand in:
quality
confmat