Semestral project

General Assignment

  • Model and implement an analysis of user activity on an online platform (for example, educational, commercial, or social) using several NoSQL databases.
  • Formulate realistic analytical tasks (queries) for the subject domain you have chosen.
  • Based on these tasks, select suitable NoSQL DBMSs and storage structures so that the specified queries are executed efficiently and correctly. Justify your choices in detail.
  • Describe all steps, data structures, and query parameters so that another student could reproduce your experiment.
  • Provide brief explanations for all scripts/templates/examples.
  • Using the same subject domain is permitted.
  • Using identical datasets and tasks (queries) is prohibited.
  • The project may be completed individually or in groups of up to three students.

Project Task List

  1. Formulate the subject domain and scenario:
    • Define the type of platform, the main types of users, objects, and events.
  2. Compile a general set of related data (users, objects, events, relationships) that reflects the logic of activity.
  3. For each DBMS, determine in advance which analytical tasks/queries will be executed there:
    • At least two key queries for each DBMS.
    • For each query: explain why the platform needs it, why this DBMS was chosen, and what storage structure is required for this query.
  4. Prepare (generate) the necessary data (CSV/JSON) to implement the selected queries.
  5. Prepare and import the data into the corresponding DBMSs — MongoDB, Cassandra, Redis, Neo4j.
    • For each DBMS, prepare exactly the data needed to implement the selected analytical queries. Data structures across different DBMSs should not be duplicated unnecessarily — an exception is allowed only for comparative performance analysis.
    • In the report, explain why this data is loaded into this DBMS in this structure, and why other alternatives are not used (with a brief comparison).
    • Import the data. Scripts or GUI tools may be used to load data.
  6. Execute the selected analytical tasks in the DBMSs (according to the pre-formulated queries):
    • Include the commands used, sample outputs, and brief interpretations in the report.
  7. Perform one optimization task for each DBMS: analyze and implement a technique to improve performance.
  8. Compare the convenience, limitations, and strengths of each DBMS in practice, based on the queries you implemented.
  9. Present the results in a report: the scenario, data structure, selection of queries and DBMSs, imports, analytics implementation, comparative analysis, and conclusions.

Report

  • Description of the platform and data model/schema (including relationships where applicable).
  • List of all queries and justifications for the choice of DBMS.
  • The format and an example of each data structure for each DBMS.
  • Commands and results of all queries.
  • Attach a separate file raw-queries.txt with all native DB commands, as well as CSV exports of results — one per DBMS. For optimizations, attach screenshots/output of EXPLAIN/PROFILE/TRACING.
  • Description of optimization: what was done and why, and the result.
  • Comparison tables where appropriate.
  • A summary: “what you would do differently if the volume grew 10×,” “which DBMS turned out to be the most convenient for your scenario and why.”
courses/be4m36ds2/homework/start.txt · Last modified: 2025/09/21 18:38 by prokoyul