===== Semestral project ===== {{indexmenu>be4m36ds2:homework#1|js}} **General Assignment** * Model and implement an analysis of user activity on an online platform (for example, educational, commercial, or social) using several NoSQL databases. * Formulate realistic analytical tasks (queries) for the subject domain you have chosen. * Based on these tasks, select suitable NoSQL DBMSs and storage structures so that the specified queries are executed efficiently and correctly. Justify your choices in detail. * Describe all steps, data structures, and query parameters so that another student could reproduce your experiment. * Provide brief explanations for all scripts/templates/examples. * Using the same subject domain is permitted. * Using identical datasets and tasks (queries) is prohibited. * The project may be completed individually or in groups of up to three students. **Project Task List** - Formulate the subject domain and scenario: * Define the type of platform, the main types of users, objects, and events. - Compile a general set of related data (users, objects, events, relationships) that reflects the logic of activity. - For each DBMS, determine in advance which analytical tasks/queries will be executed there: * At least two key queries for each DBMS. * For each query: explain why the platform needs it, why this DBMS was chosen, and what storage structure is required for this query. - Prepare (generate) the necessary data (CSV/JSON) to implement the selected queries. - Prepare and import the data into the corresponding DBMSs — MongoDB, Cassandra, Redis, Neo4j. * For each DBMS, prepare exactly the data needed to implement the selected analytical queries. Data structures across different DBMSs should not be duplicated unnecessarily — an exception is allowed only for comparative performance analysis. * In the report, explain why this data is loaded into this DBMS in this structure, and why other alternatives are not used (with a brief comparison). * Import the data. Scripts or GUI tools may be used to load data. - Execute the selected analytical tasks in the DBMSs (according to the pre-formulated queries): * Include the commands used, sample outputs, and brief interpretations in the report. - Perform one optimization task for each DBMS: analyze and implement a technique to improve performance. - Compare the convenience, limitations, and strengths of each DBMS in practice, based on the queries you implemented. - Present the results in a report: the scenario, data structure, selection of queries and DBMSs, imports, analytics implementation, comparative analysis, and conclusions. **Report** * Description of the platform and data model/schema (including relationships where applicable). * List of all queries and justifications for the choice of DBMS. * The format and an example of each data structure for each DBMS. * Commands and results of all queries. * Attach a separate file **raw-queries.txt** with all native DB commands, as well as CSV exports of results — one per DBMS. For optimizations, attach screenshots/output of EXPLAIN/PROFILE/TRACING. * Description of optimization: what was done and why, and the result. * Comparison tables where appropriate. * A summary: "what you would do differently if the volume grew 10×," "which DBMS turned out to be the most convenient for your scenario and why."