Homework Assignments
VSCode and JetBrains IDE connections that do not work over SSHFS are prohibited due to extensive server usage. There is no documented way to install one shared instance of required tools on a server for all users.
If you still want to use it, you need to mount NoSQL server file system to your PC and open your projects as if they were on your local file system.
For VSCode there is an SSHFS extension:
For any other IDEs you can mount remote filesystem manually on Linux/MacOS:
For Windows, use any tool like SSHFS-Win (follow tool documentation):
Note: on Linux with GNOME/Ubuntu you can do this right from Files app:
For latest GNOME 47 use “Networks” section at the top of the sidebar to add connection
For older versions, use “Other” section at the bottom of the sidebar
The address should be
sftp://$USERNAME@nosql.felk.cvut.cz/
Submissions
Submissions:
Use sftp or WinSCP to upload your submission files to the NoSQL server
Put these files into a directory ~/assignments/name/, where name is a name of a given homework
I.e. postgresql, mapreduce, redis, cassandra, mongodb, neo4j (case sensitive)
Use ssh or PuTTY to open a remote shell connection to the NoSQL server
Based on the instructions provided for a given homework assignment, verify that everything is working as expected
Go to the
~/assignments/ directory and execute
sudo submit_execute name
where
name is the name of the homework
Wait for the confirmation of success. Otherwise, your homework is not considered to be submitted
Should any complications appear, send your solution by e-mail to prokoyul@fel.cvut.cz
Just for your convenience, you can check the submitted files in the ~/submissions/ directory
Upload to BRUTE the same script + a screenshot of its execution on the NoSQL server.
Once the homework is assessed, you will find points and comments in BRUTE
Requirements:
Respect the prescribed names of individual files to be submitted (case sensitive)
Place all the files in the root directory of your submission
Do not include shared libraries or files that are not requested
I.e. do not submit files that were not explicitly requested
Do not redirect or suppress both standard and error outputs in your shell scripts
All your files must be syntactically correct and executable without errors
General homework assignment
Develop an application that uses multiple databases for different tasks within a project (polyglot persistence).
The project aims to demonstrate how different database systems can work together effectively to provide high performance, scalability, and usability.
Requirements:
The project should reasonably utilize database systems:
PostgreSQL
Redis
Cassandra
MongoDB
Neo4j
Describe the subject area and write the functional requirements for the project (1 point).
Describe each project block, select an appropriate system, and justify your choice (2 points).
Implement basic queries for each of the five databases according to the specific requirements below (14 points).
Integrate these queries into a working web application.
Successful integration of each database system (5 points, 1 point per database)
Simple graphical interface (2 points).
Demonstrate user interaction with data from each database through the web interface (1 point).
Write brief explanations of how each database is used in the project (1 point)
Total base points: 26.
Optional extensions.
*Design each project block as a microservice (+3 points).
Implement data synchronization between different blocks
Basic synchronization (e.g., clear cart in Redis at checkout) (+1 point)
Advanced synchronization across multiple databases (+3 points)
Implement error handling and data validation (+2 points)
You can organize into teams of up to 3 students to work on the project.
In this case, the requirements for the number of queries are multiplied by the number of students in the group, or each student implements different queries in the homework assignment
HW0: Topic selection
1. Choose your distinct topic.
Example: The online store will specialize in custom furniture, allowing users to select designs, materials, and dimensions. The platform will have various features such as product inventory, user sessions, shopping carts, purchase history, activity logs, and personalized recommendations.
2. Describe the subject area - tell about your project, its participants, and the tasks inside the project.
3. Write the functional requirements for the project.
4. Separate the parts of your project so that each of the listed database systems (PostgreSQL, Redis, MongoDB, Cassandra, Neo4j) will be used at least once.
Example:
Product inventory – PostgreSQL,
User accounts – PostgreSQL,
User sessions – Redis,
Shopping carts – Redis,
Purchase history – MongoDB,
Activity logs – Cassandra,
Personalized recommendations – Neo4j,
Caching frequently accessed data – Redis.
4. Write the arguments for each choice.
HW1: Relational database (PostgreSQL)
1. Create the first block of the project. It must be a relational database. Create ER-model and tables.
2. Fill these tables with data. The data must be realistic.
3. Create indexes to speed up common queries.
4. Extra: use triggers for insert, update, etc.
5. Extra: use JSON types and write queries to process data stored in a JSON object (use @> and →).
Review the further homework and prepare data suitable for all of them. The main table must have over 50 rows.
HW2: Redis
Redis data types – 2 points (+1 bonus)
Use Redis to organize a user's shopping cart, user sessions, etc., and perform all the following operations:
Strings: 5 insertions (SET), 1 read (GET), 1 update (APPEND, SETRANGE, INCR, …), 1 removal (DEL).
Lists: 5 insertions (LPUSH, RPUSH, …), 2 different reads (LPOP, RPOP, LINDEX, LRANGE), 1 removal (LREM).
Sets: 5 insertions (SADD), 2 different reads (SISMEMBER, SUNION, SINTER, SDIFF), 1 removal (SREM), 1 SCARD.
Sorted sets: 5 insertions (ZADD), 1 read (ZRANGE, ZRANGEBYSCORE), 1 update (ZINCRBY), 1 removal (ZREM, ZREMRANGEBYSCORE), 1 ZCARD or ZCOUNT.
Hashes: 5 insertions (HSET), 2 different reads (HGET, HMGET, HKEYS, HVALS, …), 1 removal (HDEL).
Geographic coordinates: 5 insertions, 1 GEOSEARCH, 1 GEODIST.
Extra: Add the necessary data, create an index, and implement at least one nontrivial search and one nontrivial aggregation query using Redisearch. Describe in natural language the inserted data and queries for it.
Extra: Add the necessary data, create an index, and implement at least three different nontrivial search queries using RedisJSON. Describe in natural language the inserted data and queries for it.
Submission:
script_sql.txt: text file with SQL queries, script_pg.py with Python script, script.txt: text file with Redis database commands.
Submit to BRUTE these three files and a screenshot of the execution on the NoSQL server
* Execution:
If you cannot connect to your Redis instance remotely, create an ssh tunnel first:
Your script will be tested on the server. Before submitting, you should run it and check if it works.
HW3: MongoDB
mongosh --port 42222 -u login -p password database script.js
$login is your username, e.g. f24_login
$database - database to connect to (same as login)
$password is your password (Use the same password you received for your account at the beginning of the semester)
$file is a file with MongoDB queries to be executed, i.e. script.js
Double dashes before port
Tools:
MongoDB 7.0.14 (installed on the NoSQL server)
References:
Server: nosql.felk.cvut.cz
Deadline: Sunday 24. 11. 2024 until 23:59
HW4: Cassandra
Points: 3
Assignment:
Implement the project block with Cassandra
Define a schema for a table
Insert 10 rows into your table
Express at least 3 update statements
You must perform replace, add and remove primitive operations (all of them) on columns of all collection types (all of them)
I.e. you must involve at least altogether 9 different primitive operations on such columns
Express 3 select statements to retrieve data
Create and use at least 1 secondary index
Create a materialized view
Write 3 queries to obtain statistical information. Use the materialized view at least once
Add comments with descriptions to all queries
Requirements:
Only use your own keyspace when working on the assignment
Do not switch to your keyspace when you are inside your script
Note that a different dedicated keyspace will be used when assessing your homework
Comments:
Error from server: code=1300 [Replica(s) failed to execute read]...
cqlsh -u $username -p $password -k $KeyspaceName -f $ScriptFile
$KeyspaceName is a name of keyspace that should be used (must already exist), e.g. f241_login
$ScriptFile is a file with CQL queries to be executed, i.e. script.cql
Tools:
References:
Server: nosql.felk.cvut.cz
Deadline: Sunday 1. 12. 2024 until 23:59
HW5: Neo4j
Points: 3
Assignment: Implement the project block with Neo4j
Insert realistic nodes and relationships into your embedded Neo4j database
Use a single CREATE statement for this purpose
Insert altogether at least 10 nodes for entities of at least 2 different types (i.e. different labels)
Insert altogether at least 15 relationships of at least 2 different types
Include properties (both for nodes and relationships)
Associate all your nodes with user-defined identifiers
Create index and constraint, demonstrate and describe their usage
Express 5 Cypher read query expressions
Use at least once MATCH, OPTIONAL MATCH, RETURN, WITH, WHERE, and ORDER BY (sub)clauses (all of them)
Use Comparison and logical operators
Use pattern conditions (EXISTS, IN)
Use string operations (CONTAINS, STARTS WITH)
Use regular expressions
Use different aggregations at least twice
Use size(), collect() at least once
Use variable length paths at least once
Perform date/time interval queries
Use list operations, e.g. use functions like head(), last(), tail(), and reduce()
Use CASE expressions
Use subqueries
Find the shortest path between two nodes
Express 5 Cypher write or read/write query expressions
Use at least once CREATE (with MATCH), DELETE, SET, REMOVE, DETACH (sub)clauses (all of them)
Requirements:
Submission: BRUTE and NoSQL server
Execution:
Tools:
Neo4j 5.23.0 (Java 21 *) (installed on the NoSQL server)
References:
Deadline: Sunday 15. 12. 2024 until 23:59
HW6: MapReduce
Points: 3 (this assignment is not a part of the project; all students must submit individual files)
Assignment:
Create an input text file. This file must be large and can contain data from your project or third-party information.
Put each entity on a separate line, i.e. assume that each line of the input file yields one input record
Organize the actual entity attributes in whatever way you can easily parse
Implement a non-trivial MapReduce job
Choose from aggregation, grouping, filtering or any other general MapReduce usage pattern
Use WordCount.java source file as a basis for your own implementation
Both the Map and Reduce functions should be non-trivial, each about 10 lines of code
It is not necessary to implement the Combine function
Comment the source file and also provide a description of the problem you are solving
You may also create a shell script that allows for the execution of your entire MapReduce job
I.e. compile source files, deploy input file, execute the actual job, retrieve its result, …
However, this script is not supposed to be submitted and serves just for your own convenience
Even if you do so, it will not be used for the purpose of homework assessment in any way
Requirements:
You may split your MapReduce job implementation into multiple Java source files
They all must be located in the submission root directory
At least MapReduce.java source file with its public MapReduce class is required
This class is expected to represent the main class of the entire MapReduce job
Do not change the way how command line arguments are processed
Do not use packages in order to organize your Java source files
Assume that only hadoop-common-3.1.1.jar and hadoop-mapreduce-client-core-3.1.1.jar libraries will be linked with your project
Do not submit your Netbeans (or any other) project directory, do not submit Hadoop (or any other) libraries
Use Java Standard Edition version 7 or newer
You are free to use your /user/f24_login/ HDFS home directory for debugging
Submission (NoSQL server):
readme.txt: description of the input data structure and objective of the MapReduce job
input.txt: text file with your sample input data (i.e. only one input file is permitted)
MapReduce.java and possibly additional *.java: Java source files with your MapReduce implementation
output.txt: expected output of your MapReduce job
Upload to BRUTE: readme.txt and screenshot (or screenshots) of the execution of your homework on the NoSQL server.
Tools:
References:
Server: nosql.felk.cvut.cz
Deadline: Sunday 22. 12. 2024 until 23:59
Bonus assignment #1 (BA1)
Analysis of Sharding and Replication Strategies in NoSQL Databases (+3 p.)
For each of the following sharding strategies (lecture 3):
Complete the following tasks:
Create a table listing the advantages and potential issues of the strategy.
Formulate the conditions under which this strategy would be most effective.
Provide a specific example of data distribution using this strategy.
Create a graphical illustration of this example (diagram, chart, or drawing).
Propose a suitable replication strategy to complement this sharding strategy and briefly justify your choice.
Prepare a brief conclusion comparing the effectiveness of the examined strategies in various NoSQL database usage scenarios.
Deadline: Sunday 22. 12. 2024 until 23:59.
Bonus assignment #2 (BA2)
Multi-Database Integration with Redis (+2 p.)
Individual Topics