courses:be4m36ds2:homework:start [CourseWare Wiki]

Semestral project

General Assignment

Model and implement an analysis of user activity on an online platform (for example, educational, commercial, or social) using several NoSQL databases.
Formulate realistic analytical tasks (queries) for the subject domain you have chosen.
Based on these tasks, select suitable NoSQL DBMSs and storage structures so that the specified queries are executed efficiently and correctly. Justify your choices in detail.
Describe all steps, data structures, and query parameters so that another student could reproduce your experiment.
Provide brief explanations for all scripts/templates/examples.
Using the same subject domain is permitted.
Using identical datasets and tasks (queries) is prohibited.
The project may be completed individually or in groups of up to three students.

Project Task List

Formulate the subject domain and scenario:
- Define the type of platform, the main types of users, objects, and events.
Compile a general set of related data (users, objects, events, relationships) that reflects the logic of activity.
For each DBMS, determine in advance which analytical tasks/queries will be executed there:
- At least two key queries for each DBMS.
- For each query: explain why the platform needs it, why this DBMS was chosen, and what storage structure is required for this query.
Prepare (generate) the necessary data (CSV/JSON) to implement the selected queries.
Prepare and import the data into the corresponding DBMSs — MongoDB, Cassandra, Redis, Neo4j.
- For each DBMS, prepare exactly the data needed to implement the selected analytical queries. Data structures across different DBMSs should not be duplicated unnecessarily — an exception is allowed only for comparative performance analysis.
- In the report, explain why this data is loaded into this DBMS in this structure, and why other alternatives are not used (with a brief comparison).
- Import the data. Scripts or GUI tools may be used to load data.
Execute the selected analytical tasks in the DBMSs (according to the pre-formulated queries):
- Include the commands used, sample outputs, and brief interpretations in the report.
Perform one optimization task for each DBMS: analyze and implement a technique to improve performance.
Compare the convenience, limitations, and strengths of each DBMS in practice, based on the queries you implemented.
Present the results in a report: the scenario, data structure, selection of queries and DBMSs, imports, analytics implementation, comparative analysis, and conclusions.

Report

Description of the platform and data model/schema (including relationships where applicable).
List of all queries and justifications for the choice of DBMS.
The format and an example of each data structure for each DBMS.
Commands and results of all queries.
Attach a separate file raw-queries.txt with all native DB commands, as well as CSV exports of results — one per DBMS. For optimizations, attach screenshots/output of EXPLAIN/PROFILE/TRACING.
Description of optimization: what was done and why, and the result.
Comparison tables where appropriate.
A summary: “what you would do differently if the volume grew 10×,” “which DBMS turned out to be the most convenient for your scenario and why.”