===== Semestral project =====

{{indexmenu>be4m36ds2:homework#1|js}}

**General Assignment**

  * Model and implement an analysis of user activity on an online platform (for example, educational, commercial, or social) using several NoSQL databases.
  * Formulate realistic analytical tasks (queries) for the subject domain you have chosen.
  * Based on these tasks, select suitable NoSQL DBMSs and storage structures so that the specified queries are executed efficiently and correctly. Justify your choices in detail.
  * Describe all steps, data structures, and query parameters so that another student could reproduce your experiment.
  * Provide brief explanations for all scripts/templates/examples.
  * Using the same subject domain is permitted.
  * Using identical datasets and tasks (queries) is prohibited.
  * The project may be completed individually or in groups of up to three students.

**Project Task List**

  - Formulate the subject domain and scenario:
      * Define the type of platform, the main types of users, objects, and events.
  - Compile a general set of related data (users, objects, events, relationships) that reflects the logic of activity.
  - For each DBMS, determine in advance which analytical tasks/queries will be executed there:
      * At least two key queries for each DBMS.
      * For each query: explain why the platform needs it, why this DBMS was chosen, and what storage structure is required for this query.
  - Prepare (generate) the necessary data (CSV/JSON) to implement the selected queries.
  - Prepare and import the data into the corresponding DBMSs — MongoDB, Cassandra, Redis, Neo4j.
      * For each DBMS, prepare exactly the data needed to implement the selected analytical queries. Data structures across different DBMSs should not be duplicated unnecessarily — an exception is allowed only for comparative performance analysis.
      * In the report, explain why this data is loaded into this DBMS in this structure, and why other alternatives are not used (with a brief comparison).
      * Import the data. Scripts or GUI tools may be used to load data.
  - Execute the selected analytical tasks in the DBMSs (according to the pre-formulated queries):
      * Include the commands used, sample outputs, and brief interpretations in the report.
  - Perform one optimization task for each DBMS: analyze and implement a technique to improve performance.
  - Compare the convenience, limitations, and strengths of each DBMS in practice, based on the queries you implemented.
  - Present the results in a report: the scenario, data structure, selection of queries and DBMSs, imports, analytics implementation, comparative analysis, and conclusions.

**Report**

  * Description of the platform and data model/schema (including relationships where applicable).
  * List of all queries and justifications for the choice of DBMS.
  * The format and an example of each data structure for each DBMS.
  * Commands and results of all queries.
  * Attach a separate file **raw-queries.txt** with all native DB commands, as well as CSV exports of results — one per DBMS. For optimizations, attach screenshots/output of EXPLAIN/PROFILE/TRACING.
  * Description of optimization: what was done and why, and the result.
  * Comparison tables where appropriate.
  * A summary: "what you would do differently if the volume grew 10×," "which DBMS turned out to be the most convenient for your scenario and why."