HW 8 – Bonus tasks (optional)

(0 - 15 points)

Choose several tasks. Maximum bonus is 15 points; choose any combination so that the total does not exceed 15 points.

Cross-analytics across DBMSs (5 p.)
- Pick at least two NoSQL DBMSs (e.g., MongoDB and Cassandra).
- Define one analytically identical task/query.
- Store the same data in both DBMSs (structures may differ to fit each model).
- Run analogous queries; record response time, correctness, and implementation effort.
- Analyze differences: which was simpler/faster/more convenient; when to prefer each DBMS.
- Report: storage structures, queries and results, pros/cons, recommendations.
- Use the HW7 measurement protocol for timings and read-volume metrics.
Integrating analytics from different DBMSs (5 p.)
- Implement at least one task that aggregates or matches results from two+ DBMSs used in this course.
- Obtain part of the data/aggregate from one DBMS (e.g., top-10 active users).
- Use it as a filter/seed in another DBMS (e.g., build their relationship graph).
- Compare, analyze, and (optionally) visualize the final result.
- Example (educational platform): MongoDB — users, activities; Neo4j — relationship graph.
  - Part 1: find users with ≥3 courses in the last month → list of ids.
  - Part 2: Neo4j subgraph for those users (all relationship types).
  - Part 3: visualize; compute hubs/clusters/isolated users.
  - Report integration steps, difficulties, and added value vs. single-DB analysis.
- Other integration examples:
  - Redis: top popular courses; MongoDB: details (titles, authors).
  - Cassandra: event logs; Neo4j: “who interacted with the same object.”
Changing requirements (2 p.)
- A new event type appears (e.g., “joint project execution”). For one chosen DBMS and one query/analytics task, redesign one component (storage structures, import pipeline, or optimization strategy) and explain the impact.
Storage-structure experiment (3 p.)
- Change the storage design for one analytical task (use a subset of the dataset) and compare the effect on time, ease of querying, and admin effort.
  - Examples: separate docs vs. nested arrays (MongoDB); different partition keys (Cassandra).
  - Record what changed and which design was more effective.
- Use the HW7 measurement protocol for timings.
Fault-tolerance analysis (2 p.)
- Model a failure (e.g., node/partition loss in Cassandra; replica failure in MongoDB; partial data loss in Redis). This is a conceptual exercise; you do not need to actually bring down nodes. Base your analysis on documentation and lecture material, and describe:
  - Impact on availability/data integrity.
  - Actions to restore normal operation.
  - Reliability conclusions for each DBMS used.
Horizontal scaling (2 p.)
- Measure the same query as data volume grows using 3–5 levels (e.g., 10%, 30%, 100% of your dataset; use subsets or sampling on a single-node deployment).
- Plot response time vs. volume and briefly analyze when latency growth becomes noticeable and what to optimize.
- Use the HW7 measurement protocol for timings.
Mini-dashboard (3 p.)
- Present 2–3 analytics results from your project as a mini-dashboard (table/chart). Focus on meaningful metrics. Screenshots from Excel/Matplotlib/Google Sheets/any BI are fine.
Performance profiling & visualization (1 p.)
- Run a series of queries where one parameter changes (e.g., sample size or number of indexes) and plot response time vs. this parameter.
- Use the HW7 measurement protocol for timings.
Append-only stream simulation (2 p.)
- For at least one DBMS, generate a growing append-only event log (e.g., time-ordered events) and insert events in batches as the log grows.
- Measure write performance (throughput and/or latency) at several log sizes and, optionally, the latency of one selected read query on top of this log.
- Use the HW7 measurement protocol for timings.

Submission

Submit HW8.docx (report) to the BRUTE system. If you include code/notebooks, attach them as a single archive (e.g., hw8.zip).

Deadline

Sunday 4. 1. 2026 until 23:59