Spam filter - step 1

We are going to create a function, which can read the information from files !truth.txt or !prediction.txt into the dictionary data structure.

Preparation

Working with a dictionary

Working with (text) files

The usage of section ''if __name__ == "__main__":''

Method ''split()'' of string values

Reading classification from a file

Task:

  • In a module called utils.py, create a function read_classification_from_file() that will read the mail classes from a text file.

Why do we need it:

  • We will need this function if we want to create a learning filter, and during the evaluation of the filter quality.

Specifications

Function read_classification_from_file() (in module utils.py) has to conform to the following specifications:

read_classification_from_file(fpath)
Input The path to the text file (most likely either !truth.txt or !prediction.txt)
Output A dictionary containing either SPAM or OK label for each filename in email corpus.

The function loads a text file contaning a pair of strings per line, separated by single space, like this:

email01 OK
email02 OK
email03 SPAM
email1234 OK
and creates a dictionary (the order of individual “rows” in the following listing is not important):
{'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}

If the file is empty, it returns an empty dictionary.

 

Writing classification (predictions) to a file

Task:

  • In module utils.py, create function write_classification_to_file() that will write the (usually predicted) mail classes to a text file.

Why do we need it:

  • The function will come handy when writing the filter; it can be used to create the !prediction.txt file.

Specifications

Function write_classification_to_file() (in module utils.py) should conform to the following specifications:

write_classification_to_file(cls_dict, fpath)
Inputs (1) dictionary containing the email file names as keys, and email classes (SPAM or OK) as values.
(2) The path to the text file that shall be created.
Output None.

The following code

>>> cls_dict = {'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}
>>> fpath = '1/!prediction.txt'
>>> write_classification_to_file(cls_dict, fpath)

shall create file !prediction.txt in directory 1 (the directory must exist) with the following contents:

email01 OK
email02 OK
email03 SPAM
email1234 OK

The actual order of individual rows in the file is not important.

If the cls_dict is empty, the function shall create an empty file.

courses/be5b33prg/homeworks/spam/step1.txt · Last modified: 2018/08/13 09:48 (external edit)