HW 5 – Wide-column data stores. Cassandra

(5 points)

  1. Design the table schemas based on the analytical tasks (see HW1) that are intended to be solved with Cassandra.
    • The schemas must strictly follow Cassandra’s query-based modeling principles: correct selection of partition keys and clustering columns, and denormalization.
    • Materialized views are not allowed.
    • ALLOW FILTERING is forbidden. All queries must be supported by schema design.
  2. Insert the data into Cassandra using the CSV/JSON generated for the project (see HW0).
    • Load only the data required for the analytical tasks. Avoid unnecessary attributes or tables.
  3. Implement and run the queries (see HW1).
    • Provide efficient CQL queries that leverage the partitioning and clustering design.
    • Demonstrate how queries are optimized by schema design (no filtering scans).
    • For time-series data, show partitioning and clustering strategies for scalability.
  4. Prepare a detailed report including:
    • The text from HW0 and HW1 (edited if necessary);
    • Full schema creation statements with a brief explanation of design decisions;
    • The data import commands/scripts;
    • All analytical queries:
      • Task number (as in HW1);
      • Description of the analytical question;
      • Full CQL query with a short explanation of how the schema supports it;
      • Screenshot of the query result.

Important:

  • If certain analytical tasks are not feasible under Cassandra’s data model, replace them with other meaningful tasks and update earlier work (HW0/HW1).
  • Use denormalization and query-driven schema design to avoid ALLOW FILTERING.
  • You are not required to submit a separate Python bulk-load script for Cassandra (unlike in HW2/HW3). However, you must actually create and populate the Cassandra tables with the project dataset at the scale defined in HW0.
  • If some of your analytical tasks require additional derived tables (for example, pre-aggregated or reorganized tables created by an external script), you must include this script in your submission (e.g. as hw5_extra_load.py or a similar file) and briefly document:
    • what tables it creates or populates,
    • how and when it is intended to be run.
  • Your hw5.cql file must contain:
    • all CREATE TABLE statements,
    • a small sample of INSERT statements for each table, so that every analytical query returns a non-empty result on a fresh database,
    • all analytical SELECT queries for the HW1 tasks for Cassandra (each preceded by a comment with the task number and description).
    • It is required that every analytical query returns a non-empty result when hw5.cql is executed on a fresh database.

Submission:

  • Submit the HW5.docx file to the BRUTE system.
  • Submit the hw5.cql file and (optionally) hw5_extra_load.py to the NoSQL server (nosql.felk.cvut.cz), containing Cassandra CQL commands with brief explanatory comments.

Execution:

  • Execute the following shell command to evaluate the whole CQL script

cqlsh -u $username -p $password -k $KeyspaceName -f $ScriptFile

Don’t forget to run the homework submission script!

  • Deadline

Sunday 7. 12. 2025 until 23:59

courses/b4m36ds2/homework/hw5.txt · Last modified: 2025/12/01 22:52 by prokoyul