# Computer Lab 09, Spam Filter I

• Q/A
• Intro to spam filter
• Practical exercises

## Practical work

### Statistics for numbers in a file

Assume we have a text file (e.g. numbers.txt) containing integers separated by spaces:

1 2 1 3 1 4

In module filestats.py, create function compute_file_statistics() that takes a path to a text file as its argument, reads in all the numbers, and returns a named tuple Statistics with fields mean, median, min, max. The statistics names tuple shall be defined as:

Statistics = namedtuple('Statistics', 'mean median min max')

Suggestions:

• You should implement another function, e.g. compute_statistics() that will accept a list of numbers as input and will produce the required data structure. Than, the main function may just read the data in, and pass them to this function.
• Note, that for set with even number of items, median is defined as an average of the 2 middle items (when the collection is sorted).

#### Usage example

>>> from filestats import compute_file_statistics
>>> compute_file_statistics('numbers.txt')
Statistics(mean=2.0, median=1.5, min=1, max=4)

### Countries and capitals

Let's have a text file, e.g. capitals.csv (the .csv extension stands for “comma-separated values”) containing a pair of strings on each line. The first string is a name of a country, the second string is a name of its capital:

Czech Republic,Prague
USA,Washington
Germany,Berlin
Russia,Moscow

In module 'geography.py', create function load_capitals() that takes a path to a file containing countries and their capitals as an argument, and reads it into a dictionary.

#### Usage example

>>> from geography import load_capitals
>>> print(capitals)
{'Czech Republic': 'Prague', 'USA': 'Washington', 'Germany': 'Berlin', 'Russia': 'Moscow'}
The order of the individual key-value pairs may be different.

### Collection with unique elements?

In module utils.py, create function all_elements_unique() that checks whether a collection (given as input to the function) has all items unique.

#### Usage example

>>> from utils import all_elements_unique
>>> all_elements_unique('abcdef')
True
>>> all_elements_unique([1, 2, 7])
True
False
>>> all_elements_unique([1, 1, 2, 7])
False

### Unique words

In module texttools.py, create function get_unique_words(fpath1, fpath2) which takes paths to 2 files, and returns a 2 tuple:

• set of words which were found in the first file but not in the second, and
• set of words found in the second file, but not in the first one.

#### Example usage

Given e.g. the following files text1.txt and text2.txt

When I was one,
I had just begun.

When I was two,
I was nearly new.

the result of executing the function may look like this:

>>> from texttools import get_unique_words
>>> first, second = get_unique_words('text1.txt', 'text2.txt')
>>> print(first)
{'two', 'nearly', 'new'}