HW 7 – Optimization and Comparative Analysis

(10 points)

For each NoSQL DBMS (MongoDB, Cassandra, Neo4j), after completing the target analytical queries:
- Analyze the performance of one of the queries (by execution time, volume of data scanned).
- Implement one technique to improve performance (for example: creating an index, changing key structures/partitioning, optimizing the storage schema, reducing the volume of returned data, etc.).
- Compare the result before and after optimization and explain.
- Briefly describe in the report what you did, why this technique is applicable specifically for your DBMS and your task, and how it affected the result.
Measurement protocol:
- Dataset: use the Stretch dataset.
- DBMS-specific instrumentation:
  - MongoDB: explain(“executionStats”) on find/aggregate.
  - Cassandra: TRACING ON (capture the query trace; note consistency level if relevant).
  - Neo4j: PROFILE (you may use EXPLAIN to inspect the plan without executing).
- Report “before/after”: time and read volume/plan hits (docs/keys scanned, partitions touched, DB hits).

Examples (guidance)

MongoDB
- Add indexes on key fields (e.g., user_id, date, action type).
- Compare execution time with vs. without the index (record results).
- Assess impact of document nesting vs. flatter schema (multikey costs).
- Prefer compound index order: equality → range → sort; use covered queries where possible.
Cassandra
- Review partitioning (how partition/clustering keys affect latency).
- Compare latency across key designs (the partition key must drive queries).
- Assess batch size on inserts (prefer batches to the same partition).
Neo4j
- Evaluate how graph size and traversal depth impact response time.
- Use indexes or uniqueness constraints where applicable.
- Compare a simple query vs. MATCH + WHERE on an indexed property.
- Capture DB Hits and avoid NodeByLabelScan/CartesianProduct where possible.

Comparative Analysis

Compare how easily/quickly/conveniently your queries were executed in each DBMS, and which limitations or surprises you encountered.
Analyze the optimization techniques applied.
For each task, explain why this query/type of analysis is not implemented (or is implemented poorly) in other DBMSs. Example: relationship search in Neo4j — why it is impossible (or inefficient) in Cassandra/Redis.
Create a mini-table (for the queries you actually implemented):

Scaling, schema changes, and “what if it doesn’t fit?”

Briefly describe, for each implemented query or analytics, what limitations, compromises, or potential problems may arise when scaling, changing the schema, or increasing data volume. If a new, unexpected business query becomes difficult or impossible with the chosen design, state what you would do (e.g., add a query‑driven table; adjust partition keys; precompute rollups; introduce a specialized store).

Submission Instructions

Deadline