HW 8 – Bonus tasks (optional)

(0 - 10 points)

Choose several tasks. Maximum bonus is 10 points; choose any combination so that the total does not exceed 10 points.

  1. Cross-analytics across DBMSs (5 p.)
    • Pick at least two NoSQL DBMSs (e.g., MongoDB and Cassandra).
    • Define one analytically identical task/query.
    • Store the same data in both DBMSs (structures may differ to fit each model).
    • Run analogous queries; record response time, correctness, and implementation effort.
    • Analyze differences: which was simpler/faster/more convenient; when to prefer each DBMS.
    • Report: storage structures, queries and results, pros/cons, recommendations.
    • Use the HW7 measurement protocol for timings.
  2. Integrating analytics from different DBMSs (5 p.)
    • Implement at least one task that aggregates or matches results from two+ DBMSs.
    • Obtain part of the data/aggregate from one DBMS (e.g., top-10 active users).
    • Use it as a filter/seed in another DBMS (e.g., build their relationship graph).
    • Compare, analyze, and (optionally) visualize the final result.
    • Example (educational platform): MongoDB — users, activities; Neo4j — relationship graph.
      • Part 1: find users with ≥3 courses in the last month → list of ids.
      • Part 2: Neo4j subgraph for those users (all relationship types).
      • Part 3: visualize; compute hubs/clusters/isolated users.
      • Report integration steps, difficulties, and added value vs. single-DB analysis.
    • Other integration examples:
      • Redis: top popular courses; MongoDB: details (titles, authors).
      • Cassandra: event logs; Neo4j: “who interacted with the same object.”
  3. Changing requirements (2 p.)
    • A new event type appears (e.g., “joint project execution”). Redesign one component (structures/import/optimizations) and explain the impact.
  4. Storage-structure experiment (3 p.)
    • Change the storage design for one analytical task and compare the effect on time, ease of querying, and admin effort.
      • Examples: separate docs vs. nested arrays (MongoDB); different partition keys (Cassandra).
      • Record what changed and which design was more effective.
    • Use the HW7 measurement protocol for timings.
  5. Fault-tolerance analysis (2 p.)
    • Model a failure (e.g., node/partition loss in Cassandra; replica failure in MongoDB; partial data loss in Redis) and describe:
      • Impact on availability/data integrity.
      • Actions to restore normal operation.
      • Reliability conclusions for each DBMS used.
  6. Horizontal scaling (2 p.)
    • Measure the same query as data volume grows using 3–5 levels (e.g., 10%, 30%, 100% of your dataset).
    • Plot response time vs. volume and briefly analyze when latency growth becomes noticeable and what to optimize.
    • Use the HW7 measurement protocol for timings.
  7. Mini-dashboard (3 p.)
    • Present 2–3 analytics results as a mini-dashboard (table/chart). Focus on meaningful metrics. Screenshots from Excel/Matplotlib/Google Sheets/any BI are fine.
  8. Performance profiling & visualization (1 p.)
    • Run a series of queries and plot response time vs. sample size / number of indexes.
    • Use the HW7 measurement protocol for timings.
  9. Append-only stream simulation (2 p.)
    • Generate a growing append-only event log and compare DBMS behavior under append-only load.
    • Use the HW7 measurement protocol for timings.

Submission

  • Submit HW8.docx (report) to the BRUTE system. If you include code/notebooks, attach them as a single archive (e.g., hw8.zip).

Deadline

  • Sunday 4. 1. 2026 until 23:59
courses/be4m36ds2/homework/hw8.txt · Last modified: 2025/09/21 18:44 by prokoyul