See the general homework guidelines!
Specifications: You should hand in a ZIP file (with an arbitrary name) containing 2 modules with the following functions:
library.py with functions
load_library()
index_by_author()
report_author_counts()
texttools.py with function
compute_word_importance()
Specifications for individual functions can be found below.
Assume you have a text file books.txt which represents the contents of your book library. The file may look like this:
Foundation|Asimov, Isaac Foundation and Empire|Asimov, Isaac Second Foundation|Asimov, Isaac Dune|Herbert, Frank Children of Dune|Herbert, Frank RUR|Capek, Karel 2001: A Space Odyssey|Clarke, Arthur C. 2010: Odyssey Two|Clarke, Arthur C.
Let's make the following simplifications:
|,
You shall create a small set of functions for working with such files. You should implement the following 3 functions:
load_library(),
index_by_author(), and
save_author_counts().
In module library.py, implement function load_library().
You can expect that the input text file will always exist, but it may be empty. In that case, the function shall return an empty dictionary.
It can then be used as follows:
>>> from library import load_library >>> book_author = load_library('books.txt') >>> print(book_author['RUR']) Capek, Karel >>> print(book_author['Dune']) Herbert, Frank
In module library.py, create function index_by_author(), which - in a sense - inverts the dictionary of books produced by load_library().
load_library() function).
If the input dictionary is empty, the function shall produce an empty dictionary as well.
For example, running the function on the following book dictionary (with reduced contents for the sake of brevity) would produce results shown below in the code:
>>> book_author = {'RUR': 'Capek, Karel', 'Dune': 'Herbert, Frank', 'Children of Dune': 'Herbert, Frank'} >>> books_by = index_by_author(book_author) >>> print(books_by) {'Herbert, Frank': ['Dune', 'Children of Dune'], 'Capek, Karel': ['RUR']} >>> books_by['Capek, Karel'] ['RUR'] >>> books_by['Herbert, Frank'] ['Dune', 'Children of Dune']
In module library.py, create function report_author_counts(lib_fpath, rep_filepath) which shall compute the number of books of each author and the total number of books, and shall store this information in another text file.
Assuming the file books.txt has the same contents as above, running the function like this:
>>> report_author_counts('books.txt', 'report.txt')
shall create a new text file report.txt with the following contents:
Clarke, Arthur C.: 2 Herbert, Frank: 2 Capek, Karel: 1 Asimov, Isaac: 3 TOTAL BOOKS: 8
The order of the lines is irrelevant. Do not forget the TOTAL BOOKS line! If the input file is empty, the output file shall contain just the line TOTAL BOOKS: 0.
Suggestion: There are basically 2 ways how to implement this function. You can either
index_by_author() and then easilly iterate over the dictionary, or
Both options are possible, provided the function will accept the specified arguments and will produce the right file contents. The choice is up to you.
In module texttools.py, create function compute_word_importance(fpath1, fpath2) which produces a Counter which, for each word found in file fpath1 and fpath2, stores the difference of the number of occurences in file1 and the number of occurences in file 2.
Counter object containing for each word the difference between counts of that word in the first and in the second file.
For example, having the first file text1.txt with the following contents
This text is about Python programming language.
and the second file text2.txt with the following contents
This text is about Spam.
than the following code shall have the displayed result:
>>> from texttools import compute_word_importance >>> c = compute_word_importance('text1.txt', 'text2.txt') >>> print(c) Counter({'language.': 1, 'Python': 1, 'programming': 1, 'about': 0, 'This': 0, 'is': 0, 'text': 0, 'Spam.': -1})
The function shall return an empty Counter when both the input files are empty.