Search
Your task is to implement a program that will search a database of people and their personal data. Each person and their personal details are considered as one database entry, and as it is a very simple database, each entry is a structured text file (see overview below). Thus, your program should search through those files and process the read entries in the following way.
For more details, see details below.
Your program should use the producer-consumer scheme for the database query by utilizing tools from the library pthread.h. In your main thread, you should create two pools of threads. The first pool of threads should represent the producers. The producers are responsible for reading the database content and providing it to the consumers through a memory buffer of the size given by a macro BUFFER_SIZE. When the buffer is full, the producers should wait for it to have available space. The second pool of threads should represent the consumers that should read the entries from the buffer and analyse them. When the buffer is empty, the consumers should wait for an entry to appear. The number of producers is provided by the macro PRODUCER_COUNT, and the number of consumers is provided by the macro CONSUMER_COUNT.
pthread.h
BUFFER_SIZE
PRODUCER_COUNT
CONSUMER_COUNT
When there are no files left to be read, the producer threads should terminate cleanly and notify the consumers about the event, so they will not expect more data in a buffer. When any consumer finds the sought database entry (see point C above), it should notify all other threads about it, and they should stop and terminate cleanly. The consumer thread that found the entry should provide the entry to the main thread of your program, which should print it and then terminate cleanly. In case there is no such entry found that would satisfy the requirements (set in point C above), the consumer threads terminate cleanly, and the main thread is notified about the fact that no entry was found. Then, the main thread prints on the standard error output a message No entries found!.
No entries found!
Each personal entry contains the field Salary. The consumers should be able to collectively calculate the average salary of all the persons’ entries processed by them so far. Here, we are interested in the field Date of birth with the structure YYYY-MM-DD (see overview below). The histogram (table with the number of occurrences of particular values) will be formed from the month part. The consumers should be able to collectively create the histogram. The reference n-th birthday will be the first command line argument, and the day and month for the birthday search will be the second command line argument in the form of DD-MM with a dash separating the fields. E.g. ./main 40 23-02 to search for a person who will be 40 years old this year (2022) on 23rd February. Once the first entry with the personal details fulfilling this query is found, all created threads (producers and consumers) should stop and terminate correctly while releasing their unnecessary resources. The values calculated in points A, B and C should be given to the main thread of the program, which will print them on the standard output.
Salary
Date of birth
YYYY-MM-DD
DD-MM
./main 40 23-02
The threads should always handle the resources in a decently safe manner, thus always closing the opened files and deallocating all dynamic memory. Create a Makefile for compiling your program. Separate your program into modules with safe headers. Functions should be commented on in English about what they do, their inputs and outputs. The main program should then print the calculated values as follows:
#Average salary: <33333> #Month histogram: +-----+-----+------+-----+-----+-----+-----+-----+-----+-------+-----+-----+ | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | +-----+-----+------+-----+-----+-----+-----+-----+-----+-------+-----+-----+ | 0 | 456 | 1000 | 88 | 0 | 1 | 75 | 64 | 3 | 94687 | 4 | 0 | +-----+-----+------+-----+-----+-----+-----+-----+-----+-------+-----+-----+ #Found entry: Name: <John Doe> Address: <Street 123, City, Country> Phone: <123456789> Date of birth: <YYYY-MM-DD> Occupation: <company, position> Salary: <40000>
|
+
The database has the form of a large number of simple text files. The files are located in a directory given by the macro DATABASE_DIR, and the number of files is given by the macro DATABASE_SIZE. Each contains exactly one database entry of a form:
DATABASE_DIR
DATABASE_SIZE
Name: <John Doe> Address: <Street 123, City, Country> Phone: <123 456 789> Date of birth: <YYYY-MM-DD> Occupation: <company, position> Salary: <40000>
All the files have the same name format, which is their sequence number starting with one, and it is left-padded with zeros, e.g. file names 313.dat and 002.dat for a database with the size 513.
313.dat
002.dat
The maximum number of points is 25.