======Spam filter - step 1======
We are going to create a function, which can read the information from files ''!truth.txt'' or ''!prediction.txt'' into the //dictionary// data structure.

<WRAP round download>
  * [[.unit_testing|Tests]] for step 1: {{:courses:a4b99rph:cviceni:spam:test1_readclassification.zip|}}
</WRAP>


=====Preparation=====

  * Working with a //dictionary// (see [Pilgrim2004], chapter [[http://diveinto.org/python3/native-datatypes.html#dictionaries|2.7]], or {[a4b99rph:Wentworth2012]}, chapter [[http://openbookproject.net/thinkcs/python/english3e/dictionaries.html|20]]).
    * How to create an empty dictionary.
    * How to add a key-value pair.
    * How to read a value of a key.
    * How to browse the dictionary by items using method ''items()'':<code python>
eng_to_cz = {'cat': 'kocka', 'dog': 'pes', 'house': 'dum' }
for eng, cz in eng_to_cz.items():
    print(eng, ',', cz)
</code>
  * Working with (text) files (viz {[a4b99rph:Pilgrim2009]}, chapter [[http://diveinto.org/python3/files.html|11]], or {[a4b99rph:Wentworth2012]}, chapter [[http://openbookproject.net/thinkcs/python/english3e/files.html|13]]).
    * How to open and close a text file.
    * How to use the ''with'' command.
    * Reading a file line by line.
    * Reading the whole file contents as a single string.
  * The usage of section <code>if __name__ == "__main__":</code> (see {[a4b99rph:Pilgrim2009]}, chapter [[http://diveinto.org/python3/your-first-python-program.html#runningscripts|1.10]]).
  * Method ''split()'' of string values (see the Python docs for [[http://docs.python.org/py3k/library/stdtypes.html?highlight=split#str.split|str.split()]])


===== Reading classification from a file =====

Task:
  * In a module called ''utils.py'', create a function ''read_classification_from_file()'' that will read the mail classes from a text file.

Why do we need it:
  * We will need this function if we want to create a learning filter, and during the evaluation of the filter quality. 

==== Specifications ====
Function ''read_classification_from_file()'' (in module ''utils.py'') has to conform to the following specifications: 

^  Input  | The path to the text file (most likely either ''!truth.txt'' or ''!prediction.txt'')  |
^  Output  | A dictionary containing either ''SPAM'' or ''OK'' label for each filename in email corpus.  |

The function loads a text file contaning a pair of strings per line, separated by single space, like this:
<code>
email01 OK
email02 OK
email03 SPAM
email1234 OK
...
</code>
and creates a dictionary (the order of individual "rows" in the following listing is not important):
<code python>
{'email1234': 'OK', 'email03': 'SPAM', 'email02': 'OK', 'email01': 'OK'}
</code>

If the file is empty, it returns an empty dictionary.


> {{page>courses:a4b99rph:internal:cviceni:spam:tyden07#read_classification_from_file&editbtn}}