====== XP36RGM -- Reading group in data mining and machine learning ====== **Data mining** aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. Data size and data heterogeneity make two key data mining technical issues to be solved. The main goal is to understand the patterns that drive the processes generating the data. **Machine learning** focuses at computer algorithms that can improve automatically through experience and by the use of data. It often puts emphasis on performance that the algorithms reach. The distinction between DM and ML is not strict as machine learning is often used as a means of conducting useful data mining. For this reason, we cover both the areas in the same course. The course will take a form of **reading and discussion group**. Each student gives two 1 hour lectures, followed by a 30 min discussion. One of the lectures should be general (MMDS book chapters, recent tutorials at major ML/DM conferences, etc.), the second one can present your research (if ML/DM related) or a ML/DM topic that is closely related to your research or research interests. Each student is supposed to read a review paper recommended for the topic before presentations of the other students. It is assumed that students have completed at least some of the master courses on Machine Learning and Data Analysis (B4M36SAN, B4M46SMU, BE4M33SSU). Go beyond the literature, provide own insight, offer own illustrative examples, etc. The students who do not present are supposed to **read recommended reading** and **prepare a couple of questions** before the class. The questions will be discussed during or after the talk. ===== Fall 2021 ===== ^ L ^ Date ^ Presents ^ Contents ^ Reading ^ Talk, other links ^ | 1 | Oct 1 | JK, FZ | Course overview, introduction, research interests | | {{ :courses:xp36rgm:rgm_intro.pdf | Course overview}} | | 2 | Oct 15 | Xuzhe Dang | Model-based reinforcement learning | [[https://www.nature.com/articles/s41586-020-03051-4|MuZero]] | {{ :courses:xp36rgm:model-based_rl.pdf | Talk}} | | 3 | Oct 22 | Peter Jung | How neural networks changed the landscape of graph processing | [[https://arxiv.org/ftp/arxiv/papers/1812/1812.08434.pdf|GNNs]] | {{ :courses:xp36rgm:gnns.pdf | Talk}} | | 4 | Nov 5 | Herbert Ullrich | Automated Fact-Checking | [[https://arxiv.org/pdf/2108.11896.pdf|AFC]] | [[https://campuscvut-my.sharepoint.com/:p:/g/personal/ullriher_cvut_cz/EZiYK3NlZ0pHhcgAfVCCEOcBz36e2ISJfLDbRTozCpXP2w?e=TaJtPO |Talk]] | | 5 | Nov 5 | Jaroslav Moravec | Outlier detection methods -- RANSAC | [[https://cmp.felk.cvut.cz/~chum/papers/Raguram-PAMI13.pdf|USAC]] | {{ :courses:xp36rgm:ransac_presentation.pdf |Talk}} | | 6 | Nov 12 | Michaela Urbanovska | Neural Algorithmic Reasoning | [[https://arxiv.org/pdf/2105.02761.pdf|NAR]] | {{ :courses:xp36rgm:rgm_presentation_neural_algorithmic_reasoning_1_.pdf |Talk}} | | 7 | Nov 19 | Ondrej Lukas | Explainable AI | [[https://proceedings.neurips.cc/paper/2020/file/2c29d89cc56cdb191c60db2f0bae796b-Paper.pdf|ExpNN]] | {{ :courses:xp36rgm:explainai.pdf |Talk}} | | 8 | Nov 26 | Lukas Korel | ML with ontologies | [[https://academic.oup.com/bib/article/22/4/bbaa199/5922325|MLwO]] | {{ :courses:xp36rgm:machine_learning_with_ontologies.pdf |Talk}} | | 9 | Dec 3 | Peter Jung | Generative Adversarial Networks | [[https://arxiv.org/abs/1511.06434|GANs]]| {{ :courses:xp36rgm:gan_pj.pdf|Talk}}| | 10 | Dec 10 | Xuzhe Dang | StyleGANs | [[https://webmail.fel.cvut.cz/horde4/imp/attachment.php?id=618bd1b4-8144-48c9-b6e7-65b69320d2a8&u=dangxuzh|StyleGANs]]| {{ :courses:xp36rgm:stylegan.pdf|Talk}} | | 11 | Dec 17 | Herbert Ullrich | BERT-like encoder models | [[https://arxiv.org/pdf/1810.04805.pdf|BERT]] | [[https://campuscvut-my.sharepoint.com/:p:/g/personal/ullriher_cvut_cz/Edfr66CrGoZHojeerDfl_wYBcu7CvlP_26f_4o3hqAg1Qg?e=C7q3c4|TempTalk]],[[http://bertik.net/bert/|Demo]] | | 12 | Jan 7 | Ondrej Lukas | Federated learning | [[http://proceedings.mlr.press/v54/mcmahan17a.html|FederatedAveraging]] | {{ :courses:xp36rgm:rgm_federated_learning_olukas.pdf |Talk}} | | 13 | Jan 14 | Jaroslav Moravec | Visual space odometry | [[https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhou_Unsupervised_Learning_of_CVPR_2017_paper.pdf|VSO]] | {{ :courses:xp36rgm:mvo_sfm.pdf |TempTalk}} | | 14 | Jan 21 | Michaela Urbanovska | Deep learning for automated planning | [[https://arxiv.org/pdf/1806.02308.pdf|Geffner]] | {{ :courses:xp36rgm:dl_for_planning.pdf |Talk}} | | 15 | Jan 28 | Lukas Korel | Video Scene Location Recognition | [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5509682|ISR]] | | | x | Jan 28 | JK, FZ | **exam -- see the plan in the table below** | | | ===== Final presentations = exam ===== Each participant prepares a 5-7min talk that summarizes the main ideas presented before by another course participant. The topic assignment is: ^ Topic ^ Date of the original presentation ^ For exam presented by ^ | Model-based reinforcement learning | Oct 15 | Lukas Korel | | How neural networks changed the landscape of graph processing | Oct 22 | Herbert Ullrich | | Outlier detection methods -- RANSAC | Nov 5 | Xuzhe Dang | | Neural Algorithmic Reasoning | Nov 12 | Peter Jung | | Explainable AI | Nov 19 | Jaroslav Moravec | | Generative Adversarial Networks | Dec 3 (Dec 10 could be used too) | Michaela Urbanovska | | BERT-like encoder models | Dec 17 | Ondrej Lukas | ===== References ===== * Recent papers: [[https://distill.pub/|Distill papers]], [[https://arxiv.org/pdf/1606.04838.pdf|Optimization Methods for Large-Scale Machine Learning]], [[https://arxiv.org/pdf/1701.07875.pdf%20http://arxiv.org/abs/1701.07875.pdf|Wasserstein GAN]], [[https://dl.acm.org/ft_gateway.cfm?ftid=1775849&id=2939785|XGBoost: A Scalable Tree Boosting System]],[[https://arxiv.org/pdf/1702.08835.pdf|Deep Forest]], [[https://arxiv.org/pdf/1611.09347.pdf|Quantum Machine Learning]], * Rajaraman, A., Leskovec, J., Ullman, J. D.: [[http://www.mmds.org/|Mining of Massive Datasets]], Cambridge University Press, 2011. * [[http://bigdata-madesimple.com/27-free-data-mining-books/|Free Data mining Books]] * Recent tutorials, major ML/DM conferences: * ICML: [[https://icml.cc/Conferences/2021/Schedule?type=Tutorial|2021]], [[https://icml.cc/virtual/2020/events/Tutorial|2020]], [[https://icml.cc/Conferences/2019/ScheduleMultitrack?session=&event_type=Tutorial&day=|2019]], * KDD: [[https://kdd.org/kdd2021/tutorials|2021]], [[https://www.kdd.org/kdd2020/tutorials/lecture-tutorials|2020]],[[https://www.kdd.org/kdd2019/hands-on-tutorials|2019]], * ECML/PKDD: [[https://2021.ecmlpkdd.org/?page_id=1705|2021]], [[https://ecmlpkdd2020.net/programme/workshops/|2020]], [[https://ecmlpkdd2019.org/programme/workshops/|2019]], * NeurIPS: [[https://neurips.cc/Conferences/2021/Schedule?type=Tutorial|2021]], [[https://neurips.cc/virtual/2020/public/e_tutorials.html|2020]], [[https://nips.cc/Conferences/2019/Schedule?type=Tutorial|2019]], * Review papers: [[http://www.cs.uvm.edu/~icdm/10Problems/10Problems-06.pdf|Yang, Wu: 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH]], [[http://www.realtechsupport.org/UB/CM/algorithms/Wu_10Algorithms_2008.pdf| Wu et al.: Top 10 algorithms in data mining]] * External seminars: [[http://ai.ms.mff.cuni.cz/~sui/|ML seminars at MFF]], [[http://praguecomputerscience.cz/|PIS]], [[ http://www.mlmu.cz/program/|Machine Learning Meetups]], [[https://keg.vse.cz/seminars.php|FIS KEG]]. ===== Links ===== * Lecturers: [[http://ida.felk.cvut.cz/klema/|Jiří Kléma]], [[http://ida.felk.cvut.cz/zelezny/|Filip Železný]] * [[https://fel.cvut.cz/cz/education/rozvrhy-ng.B211/public/html/predmety/66/33/p6633906.html|Class schedule]] * [[https://www.fel.cvut.cz/cz/education/bk/predmety/29/68/p2968906|Course syllabus]] (the former XP36VPD). ===== Evaluation, requirements ===== * every student must give his talks (the principle requirement in this type of course), * attendance and active discussion at presentations of other students, * pass the exam, i.e., prove the knowledge of basic concepts presented during the course. ===== Previous runs ===== * XP36VPD -- Selected parts of data mining, a very similar content (despite its title it covered ML topics too), running since 2015.