====== xp36vpd -- Selected parts of data mining ====== **Data mining** aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. This course focuses on two key data mining issues: data size and their heterogeneity. When dealing with large data, it is important to resolve both the technical issues such as distributed computing or hashing and general algorithmic complexity. In this part, the course will be motivated mainly by case studies on web and social network mining. The second part will discuss approaches that merge heterogeneous prior knowledge with measured data. Bioinformatics will make the main application field here. It is assumed that students have completed at least some of the master courses on Machine Learning and Data Analysis (B4M36SAN, B4M46SMU, BE4M33SSU). The course will take a form of **reading and discussion group**. Each student gives two 1 hour lectures, followed by a 30 min discussion. One of the lectures shall be DM general (MMDS book chapters, recent tutorials at major ML/DM conferences, etc.), the second one can present your research (if DM related) or a DM topic that is closely related to your research or research interests. Each student is supposed to read a review paper recommended for the topic before presentations of the other students. Go beyond the literature, provide own insight, offer own illustrative examples, etc. The students who do not present are supposed to **read recommended reading** and **prepare a couple of questions** before the class. The questions will be discussed during or after the talk. ===== Fall 2019 ===== ^ L ^ Date ^ Presents ^ Contents ^ Reading ^ Talk, other links ^ | 1 | Oct 11 | JK, FZ | Course overview, introduction, research interests. | | {{ :courses:xp36vpd:vpd_intro.pdf | Course overview}}| | 2 | Oct 18 | Anh Vu Le | Non-linear dimensionality reduction -- UMAP | [[http://www.math.chalmers.se/Stat/Grundutb/GU/MSA220/S18/DimRed2.pdf|DR review]] | [[https://arxiv.org/pdf/1802.03426.pdf|UMAP]], [[https://distill.pub/2016/misread-tsne/|tSNE]], {{:courses:xp36vpd:umap.pdf |Talk}} | | x | Oct 25 | Cancelled | | | | | 3 | Nov 1 | Petr Marek | Text classification with transformers | [[http://jalammar.github.io/illustrated-transformer/|The Illustrated Transformer]] | [[https://arxiv.org/pdf/1706.03762.pdf|Transformers]], [[http://jalammar.github.io/illustrated-bert/|Illustrated Bert]], [[http://jalammar.github.io/illustrated-gpt2/|Illustrated GTPT-2]], {{:courses:xp36vpd:text_classification.pdf |Talk}} | | 4 | Nov 8 | Petr Tomasek | Solving imperfect information games | [[https://arxiv.org/pdf/1701.01724.pdf|Deepstack]] | [[https://era.library.ualberta.ca/items/e1a34eb8-e7e9-4397-a534-a1e45a73e6e9|Decision-making in large two-player zero-sum games]], {{:courses:xp36vpd:imperfectgames.pdf |Talk}} | | 5 | Nov 15 | Dominic Seitz | Variational autoencoders | [[https://jaan.io/what-is-variational-autoencoder-vae-tutorial/| VA Tutorial]] | [[https://arxiv.org/pdf/1312.6114.pdf | Auto-Encoding Variational Bayes]], [[https://arxiv.org/pdf/1906.02691.pdf| VA Comprehensive Tutorial]], {{ :courses:xp36vpd:vae_pres_seitz.pdf |Talk}} | | 6 | Nov 22 | Michal Bouska | Recurrent neural networks | [[https://colah.github.io/posts/2015-08-Understanding-LSTMs/|LSTM tutorial]]| [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf|LSTM paper]], [[http://papers.nips.cc/paper/5866-pointer-networks.pdf|Pointer networks]], {{ :courses:xp36vpd:2019vpd_-_recurrent_neural_networks.pdf |Talk}} | | 7 | Nov 29 | Jianhang Ai | Quantum computing and machine learning | [[https://arxiv.org/pdf/1611.09347.pdf|Quantum ML]] | [[https://www.youtube.com/playlist?list=PLIxlJjN2V90w3KBWpELOE7jNQMICxoRwc| QC videos]],[[https://www.youtube.com/watch?v=IrbJYsep45E|Math in QC video]], {{ :courses:xp36vpd:quantum.pdf |Talk}} | | 8 | Dec 6 | Milos Pragr | Incremental learning | [[https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-19.pdf|Incremental learning]]| [[https://pdfs.semanticscholar.org/d489/a0f2fa1fc7f588a7451ee2631da29f7df48f.pdf|Dataset shift]], [[https://homepages.inf.ed.ac.uk/svijayak/publications/vijayakumar-ICML2000.pdf|LWPR algorithm]], {{:courses:xp36vpd:incrementallearning.pdf|Talk}} | | 9 | Dec 13 | Anh Vu Le | Partial least squares in Alzheimer disease research | [[http://users.cecs.anu.edu.au/~kee/pls.pdf|PLS]] | {{ :courses:xp36vpd:partial_least_squares_in_alzheimer.pdf |Talk}} | | 9 | Dec 13 | Petr Marek | Intent Classification and Out-of-Scope Prediction | [[https://www.aclweb.org/anthology/D19-1131.pdf|Intent Classification]] | {{ :courses:xp36vpd:intent_classification.pdf |Talk}}| | x | Dec 20 | Cancelled | | | | | 10 | Jan 10 | Nela Grimova | Recent advances in active learning | [[https://minds.wisconsin.edu/handle/1793/60660|AL Survey]] | {{ :courses:xp36vpd:active_learning.pdf |Talk}}| | 11 | Jan 17 | Petr Tomasek | Is continual resolving really almighty? | [[https://www.aaai.org/ocs/index.php/IAAI/IAAI16/paper/view/11814/12315|PAWS]] | {{ :courses:xp36vpd:contresolving.pdf |Talk}}| | 11 | Jan 17 | Dominik Seitz | Variational Graph Auto-Encoders | [[https://arxiv.org/pdf/1611.07308.pdf|VGAE]], [[http://tkipf.github.io/graph-convolutional-networks/| GCN]] | {{ :courses:xp36vpd:seitz_vgae_pres_final.pdf|Talk}} | | 12 | Jan 24 | Jianhang Ai | Concentration inequalities of U-statistics for sampling without replacement| [[https://arxiv.org/pdf/1309.4029.pdf|Concentration inequalities]] | {{:courses:xp36vpd:inequalities.pdf |Talk}}| | 12 | Jan 24 | Michal Bouska | Heuristic Optimizer using Regression-based Decomposition Algorithm | [[https://papers.nips.cc/paper/7214-learning-combinatorial-optimization-algorithms-over-graphs.pdf|CombGraphs]] | {{:courses:xp36vpd:horda.pdf |Talk}} | | x | Jan 31 | Cancelled | | | | | 13 | Feb 7 | Milos Pragr | Gaussian Processes in Robotic Modeling | [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7487232|GPOccupancy]] | {{ :courses:xp36vpd:gpinrobotics.pdf |Talk}}| | 13 | Feb 7 | Nela Grimova | Machine learning from biosignals | | | | 13 | Feb 7 | JK, FZ | **zkouška** | | | ===== References ===== * Recent papers: [[https://distill.pub/|Distill papers]], [[https://arxiv.org/pdf/1606.04838.pdf|Optimization Methods for Large-Scale Machine Learning]], [[https://arxiv.org/pdf/1701.07875.pdf%20http://arxiv.org/abs/1701.07875.pdf|Wasserstein GAN]], [[https://dl.acm.org/ft_gateway.cfm?ftid=1775849&id=2939785|XGBoost: A Scalable Tree Boosting System]],[[https://arxiv.org/pdf/1702.08835.pdf|Deep Forest]], [[https://arxiv.org/pdf/1611.09347.pdf|Quantum Machine Learning]], * Rajaraman, A., Leskovec, J., Ullman, J. D.: [[http://www.mmds.org/|Mining of Massive Datasets]], Cambridge University Press, 2011. * [[http://bigdata-madesimple.com/27-free-data-mining-books/|Free Data mining Books]] * Recent tutorials, major ML/DM conferences:[[https://icml.cc/Conferences/2019/ScheduleMultitrack?session=&event_type=Tutorial&day=| ICML 2019]], [[https://www.kdd.org/kdd2019/hands-on-tutorials|KDD 2019]], [[https://ecmlpkdd2019.org/programme/workshops/|ECML/PKDD 2019]], [[https://nips.cc/Conferences/2018/Schedule?type=Workshop|NIPS 2018]] * Review papers: [[http://www.cs.uvm.edu/~icdm/10Problems/10Problems-06.pdf|Yang, Wu: 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH]], [[http://www.realtechsupport.org/UB/CM/algorithms/Wu_10Algorithms_2008.pdf| Wu et al.: Top 10 algorithms in data mining]] * External seminars: [[http://ai.ms.mff.cuni.cz/~sui/|ML seminars at MFF]], [[http://praguecomputerscience.cz/|PIS]], [[ http://www.mlmu.cz/program/|Machine Learning Meetups]], [[https://keg.vse.cz/seminars.php|FIS KEG]]. ===== Links ===== * Lecturers: [[http://ida.felk.cvut.cz/klema/|Jiří Kléma]], [[http://ida.felk.cvut.cz/zelezny/|Filip Železný]] * [[https://www.fel.cvut.cz/cz/education/rozvrhy-ng.B181/public/html/predmety/29/68/p2968906.html|Class schedule]], meetings every Friday at 11:00 in KN:E-205. NOT as in the official schedule! * [[https://www.fel.cvut.cz/cz/education/bk/predmety/29/68/p2968906|Course syllabus]]. ===== Evaluation, requirements ===== * every student must give his talks (the principle requirement in this type of course), * attendance and active discussion at presentations of other students, * pass the exam, i.e., prove the knowledge of basic concepts presented during the course.