====== HW 8 – Bonus tasks (optional) ======
**(0 - 15 points)**

**UPDATED! New optional tasks added**

Choose several tasks. Maximum bonus is 15 points; choose any combination so that the total does not exceed 15 points.

  - **Cross-analytics across DBMSs** (5 p.)
    * Pick at least two NoSQL DBMSs (e.g., MongoDB and Cassandra).
    * Define one analytically identical task/query.
    * Store the **same** data in both DBMSs (structures may differ to fit each model).
    * Run analogous queries; record response time, correctness, and implementation effort.
    * Analyze differences: which was simpler/faster/more convenient; when to prefer each DBMS.
    * Report: storage structures, queries and results, pros/cons, recommendations.
    * Use the HW7 measurement protocol for timings and read-volume metrics.
  - **Integrating analytics from different DBMSs** (5 p.)
    * Implement at least one task that aggregates or matches results from two+ DBMSs used in this course.
    * Obtain part of the data/aggregate from one DBMS (e.g., top-10 active users).
    * Use it as a filter/seed in another DBMS (e.g., build their relationship graph).
    * Compare, analyze, and (optionally) visualize the final result.
    * Example (educational platform): MongoDB — users, activities; Neo4j — relationship graph.
      * Part 1: find users with ≥3 courses in the last month → list of ids.
      * Part 2: Neo4j subgraph for those users (all relationship types).
      * Part 3: visualize; compute hubs/clusters/isolated users.
      * Report integration steps, difficulties, and added value vs. single-DB analysis.
    * Other integration examples:
      * Redis: top popular courses; MongoDB: details (titles, authors).
      * Cassandra: event logs; Neo4j: “who interacted with the same object.”
  - **Changing requirements** (2 p.)
    * A new event type appears (e.g., "joint project execution"). For one chosen DBMS and one query/analytics task, redesign one component (storage structures, import pipeline, or optimization strategy) and explain the impact.
  - **Storage-structure experiment** (3 p.)
    * Change the storage design for one analytical task (use a subset of the dataset) and compare the effect on time, ease of querying, and admin effort.
      * Examples: separate docs vs. nested arrays (MongoDB); different partition keys (Cassandra).
      * Record what changed and which design was more effective.
    * Use the HW7 measurement protocol for timings.
  - **Fault-tolerance analysis** (2 p.)
    * Model a failure (e.g., node/partition loss in Cassandra; replica failure in MongoDB; partial data loss in Redis). This is a conceptual exercise; you do not need to actually bring down nodes. Base your analysis on documentation and lecture material, and describe:
      * Impact on availability/data integrity.
      * Actions to restore normal operation.
      * Reliability conclusions for each DBMS used.
  - **Horizontal scaling** (2 p.)
    * Measure the same analytical query as data volume grows using at least 3 levels (ideally 4–5), e.g., 10%, 30%, 60%, 100% of your dataset. You may use subsets or sampling on a single-node deployment, but the query must remain identical across levels. 
    * For each level, run the query using the HW7 measurement protocol and collect a measurement table with:
      * workload level (dataset fraction / number of records)
      * query execution time (ms)
      * amount of data involved in the query (e.g., documents/rows scanned, keys examined, bytes read — depending on DBMS)
    * Build a chart from the table in Excel or Matplotlib (execution time vs. workload level) and interpret the trend (2–5 sentences).
  - **Mini-dashboard** (3 p.)
    * Choose one of your HW1 tasks that can be counted/measured by time.
      * Examples: purchases per week, active users per day, new registrations per month, ratings per month, subscriptions per week.
    * Write a query that returns a time series table with at least 6 time periods: time_period | value. 
      * Time_period must be a real date/week/month start, e.g., day / week_start / month_start.
      * Example: week_start | purchases_count.
    * Draw a chart from this table (Excel / Matplotlib / Google Sheets / any BI). The x-axis must be time (day/week/month). The y-axis is the value from the table.
    * Write 3–5 sentences explaining what the chart shows (growing, falling, spikes, seasonal changes, etc.).
    * Submit: the table, the chart, and the query you used.
    * The value must be a real-world metric (counts/sums/averages/percentages of events), not a system metric (CPU, memory, query time).
  - **Measurement reliability** (1 p.)
    * Pick one query and one fixed workload level.
    * Run it N times (e.g., 10) using the HW7 protocol (warm-up + measured runs).
    * Report a table with all runs and summarize variability (min/median/max and at least one dispersion metric: standard deviation; coefficient of variation is optional). If you cannot compute dispersion reliably, min/median/max is acceptable, but state this explicitly.
    * Provide 3–5 lines explaining likely sources of noise (cache, disk I/O, GC, background load) and how you can control them.
  - **Append-only stream simulation** (2 p.)
    * For at least one DBMS, generate a growing append-only event log (e.g., time-ordered events) and insert events in batches as the log grows.
    * Measure write performance (throughput and/or latency) at several log sizes and, optionally, the latency of one selected read query on top of this log.
    * Use the HW7 measurement protocol for timings.
  - **Redis cache evaluation (MongoDB + Redis)** (1 p.)
    * Use your HW4 MongoDB query caching with Redis.
    * Pick one cached analytical query.
    * Measure and report (HW7 protocol): cache miss vs cache hit.
    * Submit: the query text, one measurement table (miss vs hit: time + read-volume metric), and 2–4 sentences of interpretation.
  - **Redis query caching for Neo4j (Neo4j + Redis)** (2 p.)
    * Implement Redis caching of the RESULT of one existing Neo4j analytical query from your project.
    * Requirements: cache key includes query name + parameters; TTL is required.
    * Measure and report (HW7 protocol): cache miss vs cache hit.
    * Submit: the Cypher query, cache key format + TTL, one measurement table (miss vs hit: time + read-volume metric), and 3–6 sentences about when the cache may become stale and why it is still useful.
  - **Redis key naming convention** (1 p.)
    * Describe the naming rule you used for Redis keys so another person can understand your database.
    * Write your key format as a template (e.g., prefix:entity:id:subtype or prefix:entity:{id}:metric:window).
    * Provide at least 10 real key examples from your project and explain each one in one line:
      * what object it belongs to (user/movie/course/etc.)
      * what data type it stores (HASH/SET/ZSET/etc.)
      * what the key is used for (which query/task)
    * Submit: the template + the 5 examples with explanations.
  - **Idempotent import proof** (1 p.)
    * Prove that your import script is idempotent (running it again does not create duplicates).
    * Do this:
      * Run the import once.
      * Record counts of the main loaded objects (e.g., number of users, events, items).
      * Run the same import again without deleting the database.
      * Record the same counts again.
    * Submit:
      * a small table “after 1st run vs after 2nd run” (counts must be identical),
      * 2–3 sentences explaining how you avoid duplicates (unique keys / upsert / checks).
  - **Missing data handling** (1 p.)
    * Show how your queries handle missing or null fields in the dataset (e.g., missing date, missing user_id, missing rating).
    * Do this:
      * Choose one analytical query from HW1.
      * Identify which field(s) can be missing/null and would affect the result.
      * Show how you handle it in the query (ignore such records, replace with a default, or a safe fallback).
    * Submit:
      * the query (or the relevant part),
      * 1–2 concrete examples of problematic input (missing/null values) and what happens,
      * 3–5 sentences: why your handling is correct for the task.
  - **Index experiment (MongoDB)** (1 p.)
    * Pick one MongoDB query that filters and sorts by some field(s).
    * Show the difference between a “wrong” and a “correct” compound index:
      * Wrong index: the same fields but in the wrong order (so the sort is not efficiently supported).
      * Good case: a correct index that matches the query.
    * Do this:
      * Run explain() for the query with the wrong index and record execution stats.
      * Create the correct index and run explain() again.
      * Compare the plans and performance.
    * Submit:
      * the query text,
      * both index definitions (wrong vs correct),
      * a before/after table with: execution time + docs examined + keys examined (from explain),
      * 3–5 sentences: why the “good” index fits the query and what improved.

Submission
  * Submit **HW8.docx** (report) to the BRUTE system. If you include code/notebooks, attach them as a single archive (e.g., hw8.zip).


**Deadline**
  * Sunday 4. 1. 2026 until 23:59