====== Homework: Files ====== See the general [[courses:be5b33prg:tutorials:homeworks|homework guidelines]]! **Specifications:** You should hand in a ZIP file (with an arbitrary name) containing 2 modules with the following functions: * module ''library.py'' with functions * ''load_library()'' * ''index_by_author()'' * ''report_author_counts()'' * module ''texttools.py'' with function * ''compute_word_importance()'' Specifications for individual functions can be found below. ===== Book library ===== Assume you have a text file ''books.txt'' which represents the contents of your book library. The file may look like this: Foundation|Asimov, Isaac Foundation and Empire|Asimov, Isaac Second Foundation|Asimov, Isaac Dune|Herbert, Frank Children of Dune|Herbert, Frank RUR|Capek, Karel 2001: A Space Odyssey|Clarke, Arthur C. 2010: Odyssey Two|Clarke, Arthur C. Let's make the following simplifications: * book titles are separated from author names by ''|'', * book titles are unique, * each book has only a single author, but * we may have several books from the same author. You shall create a small set of functions for working with such files. You should implement the following 3 functions: * ''load_library()'', * ''index_by_author()'', and * ''save_author_counts()''. ==== Load library ==== In module ''library.py'', implement function ''load_library()''. * Inputs: * Path to a text file (with contents similar to those above) containing the individual books. * Outputs: * The function shall produce a dictionary where the book titles are used as keys and the authors' names are stored as values. You can expect that the input text file will always exist, but it may be empty. In that case, the function shall return an empty dictionary. It can then be used as follows: >>> from library import load_library >>> book_author = load_library('books.txt') >>> print(book_author['RUR']) Capek, Karel >>> print(book_author['Dune']) Herbert, Frank ==== Index by author ==== In module ''library.py'', create function ''index_by_author()'', which - in a sense - inverts the dictionary of books produced by ''load_library()''. * Inputs: * A dictionary with book titles as keys and book authors as values (the same structure as produced by ''load_library()'' function). * Outputs: * A dictionary containing book authors as keys and a **list** of all books of the respective author as values. If the input dictionary is empty, the function shall produce an empty dictionary as well. For example, running the function on the following book dictionary (with reduced contents for the sake of brevity) would produce results shown below in the code: >>> book_author = {'RUR': 'Capek, Karel', 'Dune': 'Herbert, Frank', 'Children of Dune': 'Herbert, Frank'} >>> books_by = index_by_author(book_author) >>> print(books_by) {'Herbert, Frank': ['Dune', 'Children of Dune'], 'Capek, Karel': ['RUR']} >>> books_by['Capek, Karel'] ['RUR'] >>> books_by['Herbert, Frank'] ['Dune', 'Children of Dune'] ==== Report author counts ==== In module ''library.py'', create function ''report_author_counts(lib_fpath, rep_filepath)'' which shall compute the number of books of each author **and the total number of books**, and shall store this information in another text file. * Inputs: * Path to a library text file (containing records for individual books). * Path to report text file that shall be created by this function. * Outputs: None Assuming the file ''books.txt'' has the same contents as above, running the function like this: >>> report_author_counts('books.txt', 'report.txt') shall create a new text file ''report.txt'' with the following contents: Clarke, Arthur C.: 2 Herbert, Frank: 2 Capek, Karel: 1 Asimov, Isaac: 3 TOTAL BOOKS: 8 The order of the lines is irrelevant. **Do not forget the TOTAL BOOKS line!** If the input file is empty, the output file shall contain just the line ''TOTAL BOOKS: 0''. **Suggestion:** There are basically 2 ways how to implement this function. You can either * use the 2 above functions to load the library, transform it using ''index_by_author()'' and then easilly iterate over the dictionary, or * you can work directly with the source text file, extract the author names, and count their occurences. Both options are possible, provided the function will accept the specified arguments and will produce the right file contents. The choice is up to you. ===== Working with Counters ===== In module ''texttools.py'', create function ''compute_word_importance(fpath1, fpath2)'' which produces a ''Counter'' which, for each word found in file ''fpath1'' and ''fpath2'', stores the difference of the number of occurences in ''file1'' and the number of occurences in file ''2''. * Inputs: * Path to the first text file. * Path to the second text file. * Output: * ''Counter'' object containing for each word the difference between counts of that word in the first and in the second file. For example, having the first file ''text1.txt'' with the following contents This text is about Python programming language. and the second file ''text2.txt'' with the following contents This text is about Spam. than the following code shall have the displayed result: >>> from texttools import compute_word_importance >>> c = compute_word_importance('text1.txt', 'text2.txt') >>> print(c) Counter({'language.': 1, 'Python': 1, 'programming': 1, 'about': 0, 'This': 0, 'is': 0, 'text': 0, 'Spam.': -1}) The function shall return an empty ''Counter'' when both the input files are empty.