====== Spam filter - step 5 ====== Create class ''TrainingCorpus'' by deriving it from class ''Corpus''. The class shall encapsulate a corpus with known true classification of the email messages, i.e. it shall represent a corpus usable for filter training. [[.unit_testing|Tests]] for step 5: * for step 5 only {{:courses:a4b99rph:cviceni:spam:test5_trainingcorpus.zip|}} or * together with tests for the preceding steps {{:courses:a4b99rph:cviceni:spam:test5_all.zip|}}. Class ''TrainingCorpus'' is not obligatory and it implementation is not fixed. You can implement only those methods that you find useful. The provided tests target all the below mentioned methods; if you decide not to implement them all, then delete (or comment out) the related tests in class ''TrainingCorpusClass''. =====Preparation===== By now, you should know everything you need to successfully implement the ''TrainingCorpus'' class. The only remaining thing to prepare: * Think of what the class shall be able to do so that it simplifies the training of your filter. =====Training data corpus===== Task: * In module ''trainingcorpus.py'', create class ''TrainingCorpus''. Why do we need it? * Class ''TrainingCorpus'' shall simplify the creation of learning filters. It will allow to walk the corpus with known labels of emails found in file ''!truth.txt''. ==== Specifications ==== Specifications for this class are not fixed, it is up to you to decide what methods you need. The following methods can serve as an inspiration (and the test provided for this class assume the existence of these methods): * method ''get_class(filename)'' returns the true label (OK or SPAM) for an email message stored in a file with ''filename''. * methods ''is_ham(filename)'' and ''is_spam(filename)'' return Boolean value (''True'' or ''False'') with obvious meaning for a message stored in a file with ''filename''. * methods ''spams()'' and ''hams()'' return generators which allow us to walk the spams and hams in the training corpus similarly as method ''emails()'' does for the ''Corpus'' class. * etc. It is entirely up to you if you want to implement any of these methods. >{{page>courses:a4b99rph:internal:cviceni:spam:tyden09#TrainingCorpus&editbtn}}