04 - Files

See the general homework guidelines!

Specifications: You should hand in a ZIP file (with an arbitrary name) containing 2 modules with the following functions:

module library.py with functions
- load_library()
- index_by_author()
- report_author_counts()
module texttools.py with function
- compute_word_importance()

Specifications for individual functions can be found below.

Book library

Assume you have a text file books.txt which represents the contents of your book library. The file may look like this:

Foundation|Asimov, Isaac
Foundation and Empire|Asimov, Isaac
Second Foundation|Asimov, Isaac
Dune|Herbert, Frank
Children of Dune|Herbert, Frank
RUR|Capek, Karel
2001: A Space Odyssey|Clarke, Arthur C.
2010: Odyssey Two|Clarke, Arthur C.

Let's make the following simplifications:

book titles are separated from author names by |,
book titles are unique,
each book has only a single author, but
we may have several books from the same author.

You shall create a small set of functions for working with such files. You should implement the following 3 functions:

load_library(),
index_by_author(), and
save_author_counts().

Load library

In module library.py, implement function load_library().

Inputs:
- Path to a text file (with contents similar to those above) containing the individual books.
Outputs:
- The function shall produce a dictionary where the book titles are used as keys and the authors' names are stored as values.

You can expect that the input text file will always exist, but it may be empty. In that case, the function shall return an empty dictionary.

It can then be used as follows:

>>> from library import load_library
>>> book_author = load_library('books.txt')
>>> print(book_author['RUR'])
Capek, Karel
>>> print(book_author['Dune'])
Herbert, Frank

Index by author

In module library.py, create function index_by_author(), which - in a sense - inverts the dictionary of books produced by load_library().

Inputs:
- A dictionary with book titles as keys and book authors as values (the same structure as produced by load_library() function).
Outputs:
- A dictionary containing book authors as keys and a list of all books of the respective author as values.

If the input dictionary is empty, the function shall produce an empty dictionary as well.

For example, running the function on the following book dictionary (with reduced contents for the sake of brevity) would produce results shown below in the code:

>>> book_author = {'RUR': 'Capek, Karel', 'Dune': 'Herbert, Frank', 'Children of Dune': 'Herbert, Frank'}
>>> books_by = index_by_author(book_author)
>>> print(books_by)
{'Herbert, Frank': ['Dune', 'Children of Dune'], 'Capek, Karel': ['RUR']}
>>> books_by['Capek, Karel']
['RUR']
>>> books_by['Herbert, Frank']
['Dune', 'Children of Dune']

Report author counts

In module library.py, create function report_author_counts(lib_fpath, rep_filepath) which shall compute the number of books of each author and the total number of books, and shall store this information in another text file.

Inputs:
- Path to a library text file (containing records for individual books).
- Path to report text file that shall be created by this function.
Outputs: None

Assuming the file books.txt has the same contents as above, running the function like this:

>>> report_author_counts('books.txt', 'report.txt')

shall create a new text file report.txt with the following contents:

Clarke, Arthur C.: 2
Herbert, Frank: 2
Capek, Karel: 1
Asimov, Isaac: 3
TOTAL BOOKS: 8

The order of the lines is irrelevant. Do not forget the TOTAL BOOKS line! If the input file is empty, the output file shall contain just the line TOTAL BOOKS: 0.

Suggestion: There are basically 2 ways how to implement this function. You can either

use the 2 above functions to load the library, transform it using index_by_author() and then easilly iterate over the dictionary, or
you can work directly with the source text file, extract the author names, and count their occurences.

Both options are possible, provided the function will accept the specified arguments and will produce the right file contents. The choice is up to you.

Working with Counters

In module texttools.py, create function compute_word_importance(fpath1, fpath2) which produces a Counter which, for each word found in file fpath1 and fpath2, stores the difference of the number of occurences in file1 and the number of occurences in file 2.

Inputs:
- Path to the first text file.
- Path to the second text file.
Output:
- Counter object containing for each word the difference between counts of that word in the first and in the second file.

For example, having the first file text1.txt with the following contents

This text is about Python programming language.

and the second file text2.txt with the following contents

This text is about Spam.

than the following code shall have the displayed result:

>>> from texttools import compute_word_importance
>>> c = compute_word_importance('text1.txt', 'text2.txt')
>>> print(c)
Counter({'language.': 1, 'Python': 1, 'programming': 1, 'about': 0, 'This': 0, 'is': 0, 'text': 0, 'Spam.': -1})

The function shall return an empty Counter when both the input files are empty.

Table of Contents