Choose several tasks. Maximum bonus is 15 points; choose any combination so that the total does not exceed 15 points.
Cross-analytics across DBMSs (5 p.)
Pick at least two NoSQL DBMSs (e.g., MongoDB and Cassandra).
Define one analytically identical task/query.
Store the same data in both DBMSs (structures may differ to fit each model).
Run analogous queries; record response time, correctness, and implementation effort.
Analyze differences: which was simpler/faster/more convenient; when to prefer each DBMS.
Report: storage structures, queries and results, pros/cons, recommendations.
Use the HW7 measurement protocol for timings and read-volume metrics.
Integrating analytics from different DBMSs (5 p.)
Implement at least one task that aggregates or matches results from two+ DBMSs used in this course.
Obtain part of the data/aggregate from one DBMS (e.g., top-10 active users).
Use it as a filter/seed in another DBMS (e.g., build their relationship graph).
Compare, analyze, and (optionally) visualize the final result.
Example (educational platform): MongoDB — users, activities; Neo4j — relationship graph.
Part 1: find users with ≥3 courses in the last month → list of ids.
Part 2: Neo4j subgraph for those users (all relationship types).
Part 3: visualize; compute hubs/clusters/isolated users.
Report integration steps, difficulties, and added value vs. single-DB analysis.
Other integration examples:
Redis: top popular courses; MongoDB: details (titles, authors).
Cassandra: event logs; Neo4j: “who interacted with the same object.”
Changing requirements (2 p.)
A new event type appears (e.g., “joint project execution”). For one chosen DBMS and one query/analytics task, redesign one component (storage structures, import pipeline, or optimization strategy) and explain the impact.
Storage-structure experiment (3 p.)
Change the storage design for one analytical task (use a subset of the dataset) and compare the effect on time, ease of querying, and admin effort.
Use the HW7 measurement protocol for timings.
Fault-tolerance analysis (2 p.)
Model a failure (e.g., node/partition loss in Cassandra; replica failure in MongoDB; partial data loss in Redis). This is a conceptual exercise; you do not need to actually bring down nodes. Base your analysis on documentation and lecture material, and describe:
Impact on availability/data integrity.
Actions to restore normal operation.
Reliability conclusions for each DBMS used.
Horizontal scaling (2 p.)
Measure the same analytical query as data volume grows using at least 3 levels (ideally 4–5), e.g., 10%, 30%, 60%, 100% of your dataset. You may use subsets or sampling on a single-node deployment, but the query must remain identical across levels.
For each level, run the query using the HW7 measurement protocol and collect a measurement table with:
workload level (dataset fraction / number of records)
query execution time (ms)
amount of data involved in the query (e.g., documents/rows scanned, keys examined, bytes read — depending on DBMS)
Build a chart from the table in Excel or Matplotlib (execution time vs. workload level) and interpret the trend (2–5 sentences).
Mini-dashboard (3 p.)
Choose one of your HW1 tasks that can be counted/measured by time.
Examples: purchases per week, active users per day, new registrations per month, ratings per month, subscriptions per week.
Write a query that returns a time series table with at least 6 time periods: time_period | value.
Time_period must be a real date/week/month start, e.g., day / week_start / month_start.
Example: week_start | purchases_count.
Draw a chart from this table (Excel / Matplotlib / Google Sheets / any BI). The x-axis must be time (day/week/month). The y-axis is the value from the table.
Write 3–5 sentences explaining what the chart shows (growing, falling, spikes, seasonal changes, etc.).
Submit: the table, the chart, and the query you used.
The value must be a real-world metric (counts/sums/averages/percentages of events), not a system metric (CPU, memory, query time).
Measurement reliability (1 p.)
Pick one query and one fixed workload level.
Run it N times (e.g., 10) using the HW7 protocol (warm-up + measured runs).
Report a table with all runs and summarize variability (min/median/max and at least one dispersion metric: standard deviation; coefficient of variation is optional). If you cannot compute dispersion reliably, min/median/max is acceptable, but state this explicitly.
Provide 3–5 lines explaining likely sources of noise (cache, disk I/O, GC, background load) and how you can control them.
Append-only stream simulation (2 p.)
For at least one DBMS, generate a growing append-only event log (e.g., time-ordered events) and insert events in batches as the log grows.
Measure write performance (throughput and/or latency) at several log sizes and, optionally, the latency of one selected read query on top of this log.
Use the HW7 measurement protocol for timings.
Redis cache evaluation (MongoDB + Redis) (1 p.)
Use your HW4 MongoDB query caching with Redis.
Pick one cached analytical query.
Measure and report (HW7 protocol): cache miss vs cache hit.
Submit: the query text, one measurement table (miss vs hit: time + read-volume metric), and 2–4 sentences of interpretation.
Redis query caching for Neo4j (Neo4j + Redis) (2 p.)
Implement Redis caching of the RESULT of one existing Neo4j analytical query from your project.
Requirements: cache key includes query name + parameters; TTL is required.
Measure and report (HW7 protocol): cache miss vs cache hit.
Submit: the Cypher query, cache key format + TTL, one measurement table (miss vs hit: time + read-volume metric), and 3–6 sentences about when the cache may become stale and why it is still useful.
Redis key naming convention (1 p.)
Describe the naming rule you used for Redis keys so another person can understand your database.
Write your key format as a template (e.g., prefix:entity:id:subtype or prefix:entity:{id}:metric:window).
Provide at least 10 real key examples from your project and explain each one in one line:
what object it belongs to (user/movie/course/etc.)
what data type it stores (HASH/SET/ZSET/etc.)
what the key is used for (which query/task)
Submit: the template + the 5 examples with explanations.
Idempotent import proof (1 p.)
Missing data handling (1 p.)
Show how your queries handle missing or null fields in the dataset (e.g., missing date, missing user_id, missing rating).
Do this:
Choose one analytical query from HW1.
Identify which field(s) can be missing/null and would affect the result.
Show how you handle it in the query (ignore such records, replace with a default, or a safe fallback).
Submit:
the query (or the relevant part),
1–2 concrete examples of problematic input (missing/null values) and what happens,
3–5 sentences: why your handling is correct for the task.
Index experiment (MongoDB) (1 p.)