====== HW 7 – Optimization and Comparative Analysis ====== **(10 points)** - For each NoSQL DBMS (MongoDB, Cassandra, Neo4j), after completing the target analytical queries: * Analyze the performance of **one** of the queries (by execution time, volume of data scanned). * Implement **one** technique to improve performance (for example: creating an index, changing key structures/partitioning, optimizing the storage schema, reducing the volume of returned data, etc.). * Compare the result **before and after** optimization and explain. * Briefly describe in the report **what you did, why this technique is applicable** specifically for your DBMS and your task, and **how it affected** the result. - **Measurement protocol**: * **Dataset**: use the Stretch dataset. * **DBMS-specific instrumentation**: * MongoDB: **explain("executionStats")** on find/aggregate. * Cassandra: **TRACING ON** (capture the query trace; note consistency level if relevant). * Neo4j: **PROFILE** (you may use **EXPLAIN** to inspect the plan without executing). * **Report "before/after"**: time and read volume/plan hits (docs/keys scanned, partitions touched, DB hits). ****Examples** (guidance)** * MongoDB * Add indexes on key fields (e.g., user_id, date, action type). * Compare execution time with vs. without the index (record results). * Assess impact of document nesting vs. flatter schema (multikey costs). * Prefer compound index order: equality → range → sort; use covered queries where possible. * Cassandra * Review partitioning (how partition/clustering keys affect latency). * Compare latency across key designs (the partition key must drive queries). * Assess batch size on inserts (prefer batches to the same partition). * Neo4j * Evaluate how graph size and traversal depth impact response time. * Use indexes or uniqueness constraints where applicable. * Compare a simple query vs. MATCH + WHERE on an indexed property. * Capture **DB Hits** and avoid NodeByLabelScan/CartesianProduct where possible. **Comparative Analysis** * Compare how easily/quickly/conveniently your queries were executed in each DBMS, and which **limitations or surprises** you encountered. * Analyze the **optimization techniques** applied. * For each task, explain **why this query/type of analysis is not implemented (or is implemented poorly) in other DBMSs**. //Example: relationship search in Neo4j — why it is impossible (or inefficient) in Cassandra/Redis.// * Create a **mini-table** (for the queries you actually implemented): | DBMS | Query / task | Optimization technique | Median_before | Median_after | Read volume before | Read volume after | **Scaling, schema changes, and “what if it doesn’t fit?”** * Briefly describe, for each implemented query or analytics, **what limitations, compromises, or potential problems** may arise when scaling, changing the schema, or increasing data volume. If a new, unexpected business query becomes difficult or impossible with the chosen design, state **what you would do** (e.g., add a query‑driven table; adjust partition keys; precompute rollups; introduce a specialized store). **Submission Instructions** * Submit to the BRUTE system: * **HW7.docx** file. **Deadline** * Sunday **28. 12. 2025** until 23:59