Table of Contents

Spam filter - step 1

We are going to create a function, which can read the information from files !truth.txt or !prediction.txt into the dictionary data structure.

Preparation

Working with a dictionary

Working with (text) files

The usage of section ''if __name__ == "__main__":''

Method ''split()'' of string values

Reading classification from a file

Task:

Why do we need it:

Specifications

Function read_classification_from_file() (in module utils.py) has to conform to the following specifications:

read_classification_from_file(fpath)
Input The path to the text file (most likely either !truth.txt or !prediction.txt)
Output A dictionary containing either SPAM or OK label for each filename in email corpus.

The function loads a text file contaning a pair of strings per line, separated by single space, like this:

email01 OK
email02 OK
email03 SPAM
email1234 OK
and creates a dictionary (the order of individual “rows” in the following listing is not important):
{'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}

If the file is empty, it returns an empty dictionary.

Writing classification (predictions) to a file

Task:

Why do we need it:

Specifications

Function write_classification_to_file() (in module utils.py) should conform to the following specifications:

write_classification_to_file(cls_dict, fpath)
Inputs (1) dictionary containing the email file names as keys, and email classes (SPAM or OK) as values.
(2) The path to the text file that shall be created.
Output None.

The following code

>>> cls_dict = {'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}
>>> fpath = '1/!prediction.txt'
>>> write_classification_to_file(cls_dict, fpath)

shall create file !prediction.txt in directory 1 (the directory must exist) with the following contents:

email01 OK
email02 OK
email03 SPAM
email1234 OK

The actual order of individual rows in the file is not important.

If the cls_dict is empty, the function shall create an empty file.