Create class TrainingCorpus
by deriving it from class Corpus
. The class shall encapsulate a corpus with known true classification of the email messages, i.e. it shall represent a corpus usable for filter training.
Tests for step 5:
Class TrainingCorpus
is not obligatory and it implementation is not fixed. You can implement only those methods that you find useful. The provided tests target all the below mentioned methods; if you decide not to implement them all, then delete (or comment out) the related tests in class TrainingCorpusClass
.
By now, you should know everything you need to successfully implement the TrainingCorpus
class. The only remaining thing to prepare:
Task:
trainingcorpus.py
, create class TrainingCorpus
.
Why do we need it?
TrainingCorpus
shall simplify the creation of learning filters. It will allow to walk the corpus with known labels of emails found in file !truth.txt
.
Specifications for this class are not fixed, it is up to you to decide what methods you need. The following methods can serve as an inspiration (and the test provided for this class assume the existence of these methods):
get_class(filename)
returns the true label (OK or SPAM) for an email message stored in a file with filename
.
is_ham(filename)
and is_spam(filename)
return Boolean value (True
or False
) with obvious meaning for a message stored in a file with filename
.
spams()
and hams()
return generators which allow us to walk the spams and hams in the training corpus similarly as method emails()
does for the Corpus
class.
It is entirely up to you if you want to implement any of these methods.