====== BE0M33BDT – Big Data Technologies ====== **Due to Covid limitations, lectures and practice will be in remote mode until further notice. If you understand Czech, please watch [[https://cw.fel.cvut.cz/wiki/courses/b0m33bdt/start | the Czech version of the course]]. You will get an email about next steps.** ==== Schedule ==== Will be specified. If you understand Czech, we recommend you to take part in [[https://cw.fel.cvut.cz/wiki/courses/b0m33bdt/start | the Czech version of the course]]. ==== Prerequisities ==== * registration in [[https://www.metacentrum.cz/en/Sluzby/Hadoop/index.html|Metacentrum]] (group CVUT:FEL:B0M33BDT or CVUT:FEL:A4M33BDT) * Linux basic skills (file and directory management) * SQL basic skills (creation of table, simple SELECT, GROUP BY, JOIN) * Python basic skills (list, tuple, dict, string manipulation and functions, basic regexp) * general skills in programming/scripting, using console and shell ==== Contents ==== === Theory === * {{ :courses:be0m33bdt:big-data-technologies-what-you-need-to-know.pdf |Syllabus in questions}} Presentation for workshops will be here. === Practice, Hands-on training === Tasklists for workshops will be here. === Homework === Tasks for homeworks will be here. ==== Assessment and exam requirements ==== * for assessment: at least 25 points (50 possible) got for tests and homeworks; the more points, the better position for the exam * for exam: a short interview on theoretical topics, the final mark is "sum" of assessment points and exam performance ==== Useful links ==== * [[https://wiki.metacentrum.cz/wiki/Hadoop|Metacentrum Hadoop reference page]] * [[https://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/FileSystemShell.html|HDFS DFS commands]] * [[https://learnxinyminutes.com/docs/python3/|Learn python in Y minutes]] * [[https://docs.python.org/3/|Official python documentation]] * [[https://ryanstutorials.net/regular-expressions-tutorial/regular-expressions-basics.php|Regular expressions at Ryan's tutorials]] * [[https://cwiki.apache.org/confluence/display/Hive/LanguageManual|Hive language manual]] * [[https://spark.apache.org/docs/1.6.0/|Apache Spark manual]] * [[http://spark.apache.org/docs/1.6.0/api/python/pyspark.sql.html|PySpark SQL manual]] * [[https://github.com/databricks/spark-csv|CSV files import/export]] ==== Contact ==== Course coordinator: [[mailto:jan.hucin@profinit.eu|Jan Hučín]] ==== Literature ==== Hadoop: The Definitive Guide, 4th Edition, by Tom White