====== B0M33BDT – Technologie pro velká data ====== **Flash news:** ==== Important links ==== * [[https://portal.azure.com/|Portal Azure]] * [[https://intranet.fel.cvut.cz/cz/education/rozvrhy-ng/public/html/predmety/47/73/p4773206.html| Rozvrh]] ==== Prerequisities ==== It is necessary to have a basic knowledge of following technologies: * SQL (create table, SELECT, agg SELECT, JOIN) * Python * typy list, tuple, dict, set * string manipulation * flow control (if, while, for) * function definition (def), lambda function * basic regular expressions We recommend bringing your own laptop that can connect to the internet. A smart text editor for writing Python and SQL scripts (Notepad++, PSPad, etc.) is also useful. ==== Schedule==== Classes are always held **on Wednesdays**. Classes were planned to be held in the building on Charles Square. For the duration of distance learning, links to the online classes will be listed with the respective week. * **odd week :** * lecture 9:15–10:45, room KN:E-126 * practice 11:00–12:30 , room KN:E-310 * **even week, option A (S-A):** * lecture 9:15–10:45, room KN:E-126 ==== Sylabus and schedule ==== * **1. week (27. 9.2023):** Organization, classification, motivation, overview {{ :courses:b0m33bdt:b0m33bdt-1p-intro_2023_en.pdf |}} * Practices - Intro to Azure and Databricks * **2. week (4.10.2023):** Introduction to Databricks {{ :courses:b0m33bdt:b0m33bdt-2p-intro_databricks_2023_en.pdf |}} * **3. week (11. 10.2023):** Spark basics in Databricks {{ :courses:b0m33bdt:b0m33bdt-3p-apache_spark_basics_2023_en.pdf |}} * Practices - Intro to Databricks * **4. week (18.10.2023):** Hadoop and parallel data processing 1 {{ :courses:b0m33bdt:b0m33bdt-hadoop.pdf |}} * **5. week (25.10.2023):** Hadoop and parallel data processing 2 {{ :courses:b0m33bdt:b0m33bdt-hadoop.pdf |}} * Practices - Batch processing in Databricks * **6. week (1.11.2023):** Import of data {{ :courses:b0m33bdt:b0m33bdt-nifi-kafka.pdf |}} * **7. week (8.11.2023):** Streaming {{ :courses:b0m33bdt:b0m33bdt-7p-spark-databricks-streaming_2023_en.pdf |}} * Practices - Batch processing in Databricks * **8. week (15.11.2023):** Advanced Spark practically {{ :courses:b0m33bdt:b0m33bdt_8p_advancedspark_2023_en.pdf |}} * **9. week (22.11.2023):** Cloud introduction {{ :courses:b0m33bdt:b0m33bdt-9p-cloud_2023_en.pdf |}} * Practices - Stream processing in Databricks + ** mid-term test ** * **10. week (29.11.2023):** Cloud - Azure {{ :courses:b0m33bdt:b0m33bdt-10p-azure_2023_en.pdf |}} * **11. week (6.12.2023):** PŘEDNÁŠKA ZRUŠENA * Practices - Stream processing in Databricks + ** homework assignment ** * **12. week (13.12.2023):** Databricks Advanced {{ :courses:b0m33bdt:b0m33bdt-12p-advanced-databricks_2023_en.pdf |}} * **13. week (20.12.2023):** Big Data Science PŘEDNÁŠKA POUZE FORMOU PREZENTACE (bude přidána zde) * Practices - Homework consultations * **14. week (3.1.2024):** Winter holidays - ** the lecture is cancelled ** * **15. week (10.1.2024):** Homework consultations + reserve (Serverless) * Practices - Homework consultations + ** final test ** ==== Results ==== If someone does not want to be listed in the table, contact us and we anonymize your row. ^ Name, surname ^ Homework number ^ Midterm test (20) ^ Homework (20) ^ Final practice test (20) ^ Credit ^ | | | | | | | ==== Classification requirements (credit, examination) ==== === How to get credits === * Obtaining at least 30 points out of 60 possible for the mid-term test, homework and the final test. * mid-semester midterm test - maximum of 20 points can be earned * homework - maximum of 20 points can be earned * final test at the end of the semester - maximum of 20 points can be earned * The final test can be repeated once in a make-up period by agreement with the instructor. The result of the first finals test attempt is cancelled and the result of the second final test attempt is valid, even if it is worse than the first test. === Homework === Details will be specified during the semester. ==== How to deliver the result? ==== Send your homework via e-mail - source code and the output is expected or you can send a link to your repository - source and the output is required there as well. Homework must be completed and sent at least one week before the exam. === Exams === It has a written part for 20 points and an oral part for 20 points. Both are compulsory and may lead to the need to retake the exam. ==== Kontakt ==== [[vyukaFEL@profinit.eu|VyukaFEL]] ==== Literatura ==== TBA