Data mining aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. This course focuses on two key data mining issues: data size and their heterogeneity. When dealing with large data, it is important to resolve both the technical issues such as distributed computing or hashing and general algorithmic complexity. In this part, the course will be motivated mainly by case studies on web and social network mining. The second part will discuss approaches that merge heterogeneous prior knowledge with measured data. Bioinformatics will make the main application field here. It is assumed that students have completed at least some of the master courses on Machine Learning and Data Analysis (B4M36SAN, B4M46SMU, BE4M33SSU).
The course will take a form of reading and discussion group. Each student gives two 1 hour lectures, followed by a 30 min discussion. One of the lectures shall be DM general (MMDS book chapters, recent tutorials at major ML/DM conferences, etc.), the second one can present your research (if DM related) or a DM topic that is closely related to your research or research interests. Each student is supposed to read a review paper recommended for the topic before presentations of the other students.
Go beyond the literature, provide own insight, offer own illustrative examples, etc.
The students who do not present are supposed to read recommended reading and prepare a couple of questions before the class. The questions will be discussed during or after the talk.
L | Date | Presents | Contents | Reading | Talk, other links |
---|---|---|---|---|---|
1 | Oct 9 | JK, FZ | Course overview, introduction, research interests. | Course overview | |
2 | Oct 23 | Petr Lorenc | Few-shot learning | few-shot tut | Talk |
3 | Oct 30 | Martin Smolík | Automated evaluation when learning complex outputs | EvalBLEU | Talk |
4 | Nov 6 | Marek Dědič | Learning on graph-structured data | GNNIntro | Talk, Demo_pdf, Demo_zip |
5 | Nov 13 | Petr Cezner | Generative Adversarial Networks | GANs | Talk |
6 | Nov 20 | Vojtěch Jindra | Attention is all you need | Attention | Talk |
7 | Nov 27 | Petr Lorenc | Sentence embedding in NLP | Sentence Embedding | Talk |
8 | Dec 4 | Martin Smolík | Variational autoencoders and the math behind them | VAEs | Talk |
9 | Dec 11 | Marek Dědič | Multi-instance learning and its use for clustering | MIL (sections 1-6) | Talk, Demo_pdf, Demo_zip |
10 | Dec 18 | Petr Cezner | Gaussian processes | GPs | Talk |
11 | Jan 8 | Vojtěch Jindra | Multi-instance learning and learning to cluster for clustering newspapers’ texts | Learning2cluster | Talk |
13 | Jan 8 | JK, FZ | exam |