HW 5 – Wide-column data stores. Cassandra

(5 points)

  1. Design the table schemas based on the analytical tasks (see HW1) that are intended to be solved with Cassandra.
    • The schemas must strictly follow Cassandra’s query-based modeling principles: correct selection of partition keys and clustering columns, and denormalization.
    • Materialized views are not allowed.
    • ALLOW FILTERING is forbidden. All queries must be supported by schema design.
  2. Insert the data into Cassandra using the CSV/JSON generated for the project (see HW0).
    • Load only the data required for the analytical tasks. Avoid unnecessary attributes or tables.
  3. Implement and run the queries (see HW1).
    • Provide efficient CQL queries that leverage the partitioning and clustering design.
    • Demonstrate how queries are optimized by schema design (no filtering scans).
    • For time-series data, show partitioning and clustering strategies for scalability.
  4. Prepare a detailed report including:
    • The text from HW0 and HW1 (edited if necessary);
    • Full schema creation statements with a brief explanation of design decisions;
    • The data import commands/scripts;
    • All analytical queries:
      • Task number (as in HW1);
      • Description of the analytical question;
      • Full CQL query with a short explanation of how the schema supports it;
      • Screenshot of the query result.

Important:

  • If certain analytical tasks are not feasible under Cassandra’s data model, replace them with other meaningful tasks and update earlier work (HW0/HW1).
  • Use denormalization and query-driven schema design to avoid ALLOW FILTERING.

Submission:

  • Submit the HW5.docx file to the BRUTE system.
  • Submit the hw5.cql file to the NoSQL server (nosql.felk.cvut.cz), containing Cassandra CQL commands with brief explanatory comments.

Execution:

  • Execute the following shell command to evaluate the whole CQL script

cqlsh -u $username -p $password -k $KeyspaceName -f $ScriptFile

  • $KeyspaceName is a name of keyspace that should be used (must already exist), e.g. f241_login
  • $ScriptFile is a file with CQL queries to be executed, i.e. script.cql
  • Tools:
  • References:
  • Server: nosql.felk.cvut.cz

Don’t forget to run the homework submission script!

  • Deadline

Sunday 7. 12. 2025 until 23:59

courses/be4m36ds2/homework/hw5.txt · Last modified: 2025/09/21 18:42 by prokoyul