HW 7 – Optimization and Comparative Analysis

(10 points)

  1. For each NoSQL DBMS (MongoDB, Cassandra, Neo4j), after completing the target analytical queries:
    • Analyze the performance of one of the queries (by execution time, volume of data scanned).
    • Implement one technique to improve performance (for example: creating an index, changing key structures/partitioning, optimizing the storage schema, reducing the volume of returned data, etc.).
    • Compare the result before and after optimization and explain.
    • Briefly describe in the report what you did, why this technique is applicable specifically for your DBMS and your task, and how it affected the result.
  2. Measurement protocol:
    • Dataset: use the Stretch dataset.
    • DBMS-specific instrumentation:
      • MongoDB: explain(“executionStats”) on find/aggregate.
      • Cassandra: TRACING ON (capture the query trace; note consistency level if relevant).
      • Neo4j: PROFILE (you may use EXPLAIN to inspect the plan without executing).
    • Report “before/after”: time and read volume/plan hits (docs/keys scanned, partitions touched, DB hits).

Examples (guidance)

Comparative Analysis

| DBMS | Query / task | Optimization technique | Median_before | Median_after | Read volume before | Read volume after |

Scaling, schema changes, and “what if it doesn’t fit?”

Submission Instructions

Deadline