Warning
This page is located in archive. Go to the latest version of this course pages. Go the latest version of this page.

Computer Lab 09, Spam Filter I

  • Q/A
  • Intro to spam filter
  • Practical exercises

Spam filter task - introduction

Practical work

Statistics for numbers in a file

Assume we have a text file (e.g. numbers.txt) containing integers separated by spaces:

1 2 1 3 1 4

In module filestats.py, create function compute_file_statistics() that takes a path to a text file as its argument, reads in all the numbers, and returns a named tuple Statistics with fields mean, median, min, max. The statistics names tuple shall be defined as:

Statistics = namedtuple('Statistics', 'mean median min max')

Suggestions:

  • You should implement another function, e.g. compute_statistics() that will accept a list of numbers as input and will produce the required data structure. Than, the main function may just read the data in, and pass them to this function.
  • Note, that for set with even number of items, median is defined as an average of the 2 middle items (when the collection is sorted).

Usage example

>>> from filestats import compute_file_statistics
>>> compute_file_statistics('numbers.txt')
Statistics(mean=2.0, median=1.5, min=1, max=4)

Countries and capitals

Let's have a text file, e.g. capitals.csv (the .csv extension stands for “comma-separated values”) containing a pair of strings on each line. The first string is a name of a country, the second string is a name of its capital:

Czech Republic,Prague
USA,Washington
Germany,Berlin
Russia,Moscow

In module 'geography.py', create function load_capitals() that takes a path to a file containing countries and their capitals as an argument, and reads it into a dictionary.

Usage example

>>> from geography import load_capitals
>>> capitals = load_capitals('capitals.csv')
>>> print(capitals)
{'Czech Republic': 'Prague', 'USA': 'Washington', 'Germany': 'Berlin', 'Russia': 'Moscow'}
The order of the individual key-value pairs may be different.

Collection with unique elements?

In module utils.py, create function all_elements_unique() that checks whether a collection (given as input to the function) has all items unique.

Usage example

>>> from utils import all_elements_unique
>>> all_elements_unique('abcdef')
True
>>> all_elements_unique([1, 2, 7])
True
>>> all_elements_unique('abracadabra')
False
>>> all_elements_unique([1, 1, 2, 7])
False

Unique words

In module texttools.py, create function get_unique_words(fpath1, fpath2) which takes paths to 2 files, and returns a 2 tuple:

  • set of words which were found in the first file but not in the second, and
  • set of words found in the second file, but not in the first one.

Example usage

Given e.g. the following files text1.txt and text2.txt

When I was one,
I had just begun.

When I was two,
I was nearly new.

the result of executing the function may look like this:

>>> from texttools import get_unique_words
>>> first, second = get_unique_words('text1.txt', 'text2.txt')
>>> print(first)
{'one', 'had', 'just', 'begun'}
>>> print(second)
{'two', 'nearly', 'new'}

Again, the order of individual words in the printouts of the resulting sets may differ.

Homework

Solve homework homework: working with files. See the deadline in UploadSystem.

And:

courses/be5b33prg/labs/week_09.txt · Last modified: 2019/09/20 14:29 by nemymila