====== XP36RGM -- Reading group in data mining and machine learning ====== **Data mining** aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. Data size and data heterogeneity make two key data mining technical issues to be solved. The main goal is to understand the patterns that drive the processes generating the data. **Machine learning** focuses at computer algorithms that can improve automatically through experience and by the use of data. It often puts emphasis on performance that the algorithms reach. The distinction between DM and ML is not strict as machine learning is often used as a means of conducting useful data mining. For this reason, we cover both the areas in the same course. The course will take a form of **reading and discussion group**. Each student gives two 1 hour lectures, followed by a 30 min discussion. One of the lectures should be general (MMDS book chapters, recent tutorials at major ML/DM conferences, etc.), the second one can present your research (if ML/DM related) or a ML/DM topic that is closely related to your research or research interests. Each student is supposed to read a review paper recommended for the topic before presentations of the other students. It is assumed that students have completed at least some of the master courses on Machine Learning and Data Analysis (B4M36SAN, B4M46SMU, BE4M33SSU). Go beyond the literature, provide own insight, offer own illustrative examples, etc. The students who do not present are supposed to **read recommended reading** and **prepare a couple of questions** before the class. The questions will be discussed during or after the talk. {{|}} ===== Fall 2025 ===== ^ L ^ Date ^ Presents ^ Contents ^ Reading ^ Talk, other links ^ | 1 | Sept 26 | JK | Course overview, introduction, research interests | | {{ :courses:xp36rgm:rgm_intro.pdf | Course overview}} | | 2 | Oct 10 | Azad Afandizada | Graph neural networks | [[https://arxiv.org/pdf/1812.08434|GNN_review]] | {{ :courses:xp36rgm:graph_neural_networks.pdf | Talk}} | | 3 | Oct 17 | Oleksii Shuhailo | Bigger, Better, Faster: How Neural Language Models Scale | [[https://arxiv.org/pdf/2001.08361/1000 |LLM_scaling]] | {{ | Talk}} | | 4 | Oct 24 | Karolina Drabent | Program Synthesis/Code Generation | [[https://arxiv.org/pdf/2108.13643|Synthesis]] | {{ | Talk}} | | 5 | Oct 31 | Lukáš Viceník | Geometric Deep Learning | [[https://arxiv.org/abs/2104.13478|GDL]] | {{ | Talk}} | | 6 | Nov 21 | Adéla Kubíková | tbd | [[|tbd]] | {{ | Talk}} | ===== References ===== * A couple of influential papers: * [[https://arxiv.org/pdf/2001.08361/1000|Kaplan et al.: Scaling Laws for Neural Language Models, 2020]], * [[https://proceedings.neurips.cc/paper/2021/file/f1c1592588411002af340cbaedd6fc33-Paper.pdf| Ying et al.: Do Transformers Really Perform Bad for Graph Representation?, 2021]], * [[https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf|Wei et al.: Chain-of-thought prompting elicits reasoning in large language models, 2022]], * [[http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf|Ghorbani et al.: Data shapley: Equitable valuation of data for machine learning, 2019]], * [[https://www.sciencedirect.com/science/article/pii/S1566253521002360|Shwartz-Ziv et al.: Tabular data: Deep learning is not all you need, 2022.]] * Freely accessible books: * Molnar, C.: Interpretable Machine Learning: [[https://christophm.github.io/interpretable-ml-book/|A Guide for Making Black Box Models Explainable (2nd ed.)]], Leanpub, 2022. * Zhang, A., Lipton, Z., Li, M., Smola, A.: [[https://d2l.ai/|Dive into Deep Learning]], interactive book and Cambridge University Press, 2023, * Rajaraman, A., Leskovec, J., Ullman, J. D.: [[http://www.mmds.org/|Mining of Massive Datasets]], Cambridge University Press, 2014, * Han, J., Kamber, M., Pei, J.: [[http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf|Data Mining: Concepts and Techniques]], Morgan Kaufmann, 3rd edition, 2022. * Recent tutorials, major ML/DM conferences: * ICML: [[https://icml.cc/Conferences/2025/Schedule?type=Tutorial|2025]], KDD: [[https://kdd2025.kdd.org/tutorials/|2025]], ECML/PKDD: [[https://ecmlpkdd.org/2024/program-workshops-tutorials/|2024]], NeurIPS: [[https://neurips.cc/Conferences/2024/Schedule?type=Tutorial|2024]], * External seminars: * [[http://ai.ms.mff.cuni.cz/~sui/|ML seminars at MFF]], [[http://praguecomputerscience.cz/|PIS]], [[https://www.mlmu.cz/|Machine Learning Meetups]], [[https://keg.vse.cz/seminars.php|FIS KEG]], * Past runs of this course: * [[https://cw.fel.cvut.cz/b241/courses/xp36rgm/start|2024]], [[https://cw.fel.cvut.cz/b231/courses/xp36rgm/start|2023]], [[https://cw.fel.cvut.cz/b211/courses/xp36rgm/start|2021]]. ===== Links ===== * Lecturers: [[http://ida.felk.cvut.cz/klema/|Jiří Kléma]], [[http://ida.felk.cvut.cz/zelezny/|Filip Železný]], * [[https://fel.cvut.cz/cz/education/rozvrhy-ng.B251/public/html/predmety/66/33/p6633906.html|Class schedule]] * [[https://intranet.fel.cvut.cz/en/education/bk/predmety/66/33/p6633906.html|Course syllabus]]. ===== Evaluation, requirements ===== * every student must deliver his presentations (the primary requirement for this type of course), * attendance and active participation in discussions during the presentations of other students, * pass the exam, i.e., demonstrate knowledge of the basic concepts covered in the course.