Table of Contents

04 - Files

See the general homework guidelines!

Specifications: You should hand in a ZIP file (with an arbitrary name) containing 2 modules with the following functions:

Specifications for individual functions can be found below.

Book library

Assume you have a text file books.txt which represents the contents of your book library. The file may look like this:

Foundation|Asimov, Isaac
Foundation and Empire|Asimov, Isaac
Second Foundation|Asimov, Isaac
Dune|Herbert, Frank
Children of Dune|Herbert, Frank
RUR|Capek, Karel
2001: A Space Odyssey|Clarke, Arthur C.
2010: Odyssey Two|Clarke, Arthur C.

Let's make the following simplifications:

You shall create a small set of functions for working with such files. You should implement the following 3 functions:

Load library

In module library.py, implement function load_library().

You can expect that the input text file will always exist, but it may be empty. In that case, the function shall return an empty dictionary.

It can then be used as follows:

>>> from library import load_library
>>> book_author = load_library('books.txt')
>>> print(book_author['RUR'])
Capek, Karel
>>> print(book_author['Dune'])
Herbert, Frank

Index by author

In module library.py, create function index_by_author(), which - in a sense - inverts the dictionary of books produced by load_library().

If the input dictionary is empty, the function shall produce an empty dictionary as well.

For example, running the function on the following book dictionary (with reduced contents for the sake of brevity) would produce results shown below in the code:

>>> book_author = {'RUR': 'Capek, Karel', 'Dune': 'Herbert, Frank', 'Children of Dune': 'Herbert, Frank'}
>>> books_by = index_by_author(book_author)
>>> print(books_by)
{'Herbert, Frank': ['Dune', 'Children of Dune'], 'Capek, Karel': ['RUR']}
>>> books_by['Capek, Karel']
['RUR']
>>> books_by['Herbert, Frank']
['Dune', 'Children of Dune']

Report author counts

In module library.py, create function report_author_counts(lib_fpath, rep_filepath) which shall compute the number of books of each author and the total number of books, and shall store this information in another text file.

Assuming the file books.txt has the same contents as above, running the function like this:

>>> report_author_counts('books.txt', 'report.txt')

shall create a new text file report.txt with the following contents:

Clarke, Arthur C.: 2
Herbert, Frank: 2
Capek, Karel: 1
Asimov, Isaac: 3
TOTAL BOOKS: 8

The order of the lines is irrelevant. Do not forget the TOTAL BOOKS line! If the input file is empty, the output file shall contain just the line TOTAL BOOKS: 0.

Suggestion: There are basically 2 ways how to implement this function. You can either

Both options are possible, provided the function will accept the specified arguments and will produce the right file contents. The choice is up to you.

Working with Counters

In module texttools.py, create function compute_word_importance(fpath1, fpath2) which produces a Counter which, for each word found in file fpath1 and fpath2, stores the difference of the number of occurences in file1 and the number of occurences in file 2.

For example, having the first file text1.txt with the following contents

This text is about Python programming language.

and the second file text2.txt with the following contents

This text is about Spam.

than the following code shall have the displayed result:

>>> from texttools import compute_word_importance
>>> c = compute_word_importance('text1.txt', 'text2.txt')
>>> print(c)
Counter({'language.': 1, 'Python': 1, 'programming': 1, 'about': 0, 'This': 0, 'is': 0, 'text': 0, 'Spam.': -1})

The function shall return an empty Counter when both the input files are empty.