Spam filter - step 1

Spam filter - step 1

We are going to create a function, which can read the information from files !truth.txt or !prediction.txt into the dictionary data structure.

Preparation

Working with a dictionary

Working with (text) files

The usage of section ''if __name__ == "__main__":''

Method ''split()'' of string values

Reading classification from a file

Task:

In a module called utils.py, create a function read_classification_from_file() that will read the mail classes from a text file.

Why do we need it:

We will need this function if we want to create a learning filter, and during the evaluation of the filter quality.

Specifications

Function read_classification_from_file() (in module utils.py) has to conform to the following specifications:

`read_classification_from_file(fpath)`
Input	The path to the text file (most likely either `!truth.txt` or `!prediction.txt`)
Output	A dictionary containing either `SPAM` or `OK` label for each filename in email corpus.

The function loads a text file contaning a pair of strings per line, separated by single space, like this:

email01 OK
email02 OK
email03 SPAM
email1234 OK

and creates a dictionary (the order of individual “rows” in the following listing is not important):

{'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}

If the file is empty, it returns an empty dictionary.

Writing classification (predictions) to a file

Task:

In module utils.py, create function write_classification_to_file() that will write the (usually predicted) mail classes to a text file.

Why do we need it:

The function will come handy when writing the filter; it can be used to create the !prediction.txt file.

Specifications

Function write_classification_to_file() (in module utils.py) should conform to the following specifications:

`write_classification_to_file(cls_dict, fpath)`
Inputs	(1) dictionary containing the email file names as keys, and email classes (`SPAM` or `OK`) as values.
	(2) The path to the text file that shall be created.
Output	None.

The following code

>>> cls_dict = {'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}
>>> fpath = '1/!prediction.txt'
>>> write_classification_to_file(cls_dict, fpath)

shall create file !prediction.txt in directory 1 (the directory must exist) with the following contents:

email01 OK
email02 OK
email03 SPAM
email1234 OK

The actual order of individual rows in the file is not important.

If the cls_dict is empty, the function shall create an empty file.

Table of Contents

Spam filter - step 1

Preparation

Reading classification from a file

Specifications

Writing classification (predictions) to a file

Specifications