HW 5 – Wide-column data stores. Cassandra

(5 points)

Design the table schemas based on the analytical tasks (see HW1) that are intended to be solved with Cassandra.
- The schemas must strictly follow Cassandra’s query-based modeling principles: correct selection of partition keys and clustering columns, and denormalization.
- Materialized views are not allowed.
- ALLOW FILTERING is forbidden. All queries must be supported by schema design.
Insert the data into Cassandra using the CSV/JSON generated for the project (see HW0).
- Load only the data required for the analytical tasks. Avoid unnecessary attributes or tables.
Implement and run the queries (see HW1).
- Provide efficient CQL queries that leverage the partitioning and clustering design.
- Demonstrate how queries are optimized by schema design (no filtering scans).
- For time-series data, show partitioning and clustering strategies for scalability.
Prepare a detailed report including:
- The text from HW0 and HW1 (edited if necessary);
- Full schema creation statements with a brief explanation of design decisions;
- The data import commands/scripts;
- All analytical queries:
  - Task number (as in HW1);
  - Description of the analytical question;
  - Full CQL query with a short explanation of how the schema supports it;
  - Screenshot of the query result.

Important:

If certain analytical tasks are not feasible under Cassandra’s data model, replace them with other meaningful tasks and update earlier work (HW0/HW1).
Use denormalization and query-driven schema design to avoid ALLOW FILTERING.
You are not required to submit a separate Python bulk-load script for Cassandra (unlike in HW2/HW3). However, you must actually create and populate the Cassandra tables with the project dataset at the scale defined in HW0.
If some of your analytical tasks require additional derived tables (for example, pre-aggregated or reorganized tables created by an external script), you must include this script in your submission (e.g. as hw5_extra_load.py or a similar file) and briefly document:
- what tables it creates or populates,
- how and when it is intended to be run.
Your hw5.cql file must contain:
- all CREATE TABLE statements,
- a small sample of INSERT statements for each table, so that every analytical query returns a non-empty result on a fresh database,
- all analytical SELECT queries for the HW1 tasks for Cassandra (each preceded by a comment with the task number and description).
- It is required that every analytical query returns a non-empty result when hw5.cql is executed on a fresh database.

Submission:

Submit the HW5.docx file to the BRUTE system.
Submit the hw5.cql file and (optionally) hw5_extra_load.py to the NoSQL server (nosql.felk.cvut.cz), containing Cassandra CQL commands with brief explanatory comments.

Execution:

cqlsh -u $username -p $password -k $KeyspaceName -f $ScriptFile

$KeyspaceName is a name of keyspace that should be used (must already exist), e.g. f241_login
$ScriptFile is a file with CQL queries to be executed, i.e. script.cql
Tools:
- Apache Cassandra 4.1.6 (installed on the NoSQL server)
References:
- The Cassandra Documentation
- Examples of Cassandra tasks from the HW1 (available for university accounts)
Server: nosql.felk.cvut.cz

Don’t forget to run the homework submission script!

Sunday 7. 12. 2025 until 23:59