xp36vpd -- Selected parts of data mining

Data mining aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. This course focuses on two key data mining issues: data size and their heterogeneity. When dealing with large data, it is important to resolve both the technical issues such as distributed computing or hashing and general algorithmic complexity. In this part, the course will be motivated mainly by case studies on web and social network mining. The second part will discuss approaches that merge heterogeneous prior knowledge with measured data. Bioinformatics will make the main application field here. It is assumed that students have completed at least some of the master courses on Machine Learning and Data Analysis (B4M36SAN, B4M46SMU, BE4M33SSU).

The course will take a form of reading and discussion group. Each student gives two 1 hour lectures, followed by a 30 min discussion. One of the lectures shall be DM general (MMDS book chapters, recent tutorials at major ML/DM conferences, etc.), the second one can present your research (if DM related) or a DM topic that is closely related to your research or research interests. Each student is supposed to read a review paper recommended for the topic before presentations of the other students.

Go beyond the literature, provide own insight, offer own illustrative examples, etc.

The students who do not present are supposed to read recommended reading and prepare a couple of questions before the class. The questions will be discussed during or after the talk.

Fall 2020

L	Date	Presents	Contents	Reading	Talk, other links
1	Oct 9	JK, FZ	Course overview, introduction, research interests.		Course overview
2	Oct 23	Petr Lorenc	Few-shot learning	few-shot tut	Talk
3	Oct 30	Martin Smolík	Automated evaluation when learning complex outputs	EvalBLEU	Talk
4	Nov 6	Marek Dědič	Learning on graph-structured data	GNNIntro	Talk, Demo_pdf, Demo_zip
5	Nov 13	Petr Cezner	Generative Adversarial Networks	GANs	Talk
6	Nov 20	Vojtěch Jindra	Attention is all you need	Attention	Talk
7	Nov 27	Petr Lorenc	Sentence embedding in NLP	Sentence Embedding	Talk
8	Dec 4	Martin Smolík	Variational autoencoders and the math behind them	VAEs	Talk
9	Dec 11	Marek Dědič	Multi-instance learning and its use for clustering	MIL (sections 1-6)	Talk, Demo_pdf, Demo_zip
10	Dec 18	Petr Cezner	Gaussian processes	GPs	Talk
11	Jan 8	Vojtěch Jindra	Multi-instance learning and learning to cluster for clustering newspapers’ texts	Learning2cluster	Talk
13	Jan 8	JK, FZ	exam

References

Recent papers: Distill papers, Optimization Methods for Large-Scale Machine Learning, Wasserstein GAN, XGBoost: A Scalable Tree Boosting System,Deep Forest, Quantum Machine Learning,
Rajaraman, A., Leskovec, J., Ullman, J. D.: Mining of Massive Datasets, Cambridge University Press, 2011.
Free Data mining Books
Recent tutorials, major ML/DM conferences: ICML 2020, ICML 2019, KDD 2020,KDD 2019, ECML/PKDD 2020, ECML/PKDD 2019, NeurIPS 2019, NIPS 2018
Review papers: Yang, Wu: 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH, Wu et al.: Top 10 algorithms in data mining
External seminars: ML seminars at MFF, PIS, Machine Learning Meetups, FIS KEG.

Links

Lecturers: Jiří Kléma, Filip Železný
Class schedule, meetings every Friday at 10:30 in MS Teams. NOT as in the official schedule!
Course syllabus.

Evaluation, requirements

every student must give his talks (the principle requirement in this type of course),
attendance and active discussion at presentations of other students,
pass the exam, i.e., prove the knowledge of basic concepts presented during the course.

Table of Contents

xp36vpd -- Selected parts of data mining

Fall 2020

References

Links

Evaluation, requirements