====== xp36vpd -- Selected parts of data mining ====== **Data mining** aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. This course focuses on two key data mining issues: data size and their heterogeneity. When dealing with large data, it is important to resolve both the technical issues such as distributed computing or hashing and general algorithmic complexity. In this part, the course will be motivated mainly by case studies on web and social network mining. The second part will discuss approaches that merge heterogeneous prior knowledge with measured data. Bioinformatics will make the main application field here. It is assumed that students have completed the master course on Machine Learning and Data Analysis (A4M33SAD). The course will take a form of **reading and discussion group**. Each student gives two 1 hour lectures, followed by a 30 min discussion. One of the lectures shall be DM general (MMDS book chapters, recent tutorials at major ML/DM conferences, etc.), the second one can present your research (if DM related) or a DM topic that is closely related to your research or research interests. Go beyond the literature, provide own insight, offer own illustrative examples, etc. ===== Fall 2017 ===== Meetings every Friday at 11:00 in KN:E-205. NOT as in the official schedule! ^ L ^ Date ^ Presents ^ Contents ^ Materials ^ | 1 | Oct 6 | JK, FZ | Course overview, introduction, research interests. | | | 2 | Oct 13 | Ondřej Hubáček | Outcome Forecasting in Sports | {{:courses:xp36vpd:hubacek_pred_sport.pdf|}} | | 3 | Oct 20 | Vladimír Kunc | Data Compression by Deep Learning | {{:courses:xp36vpd:kunc_dl_ae.pdf|}} | | 4 | Oct 27 | | Cancelled | | | 5 | Nov 3 | Jiří Kléma | Statistically significant does not mean important | {{:courses:xp36vpd:odvracena_strana_statisticke_vyznamnosti.pdf|}} | | 6 | Nov 10 | Jan Pichl | Question Answering and Dialog Systems | {{:courses:xp36vpd:vpd_-_question_answering_and_dialogue_systems.pdf|pichl_qa_dialog_systems.pdf}} | | 7 | Nov 17 | | Holiday | | | 8 | Nov 24 | Magda Friedjungová | Introduction to Transfer Learning | {{:courses:xp36vpd:vpd-tl_friedmag.pdf|friedmag_tl.pdf}} | | 9 | Dec 1 | Jakub Repický | Active Learning in Regression Tasks | {{:courses:xp36vpd:vpdd2017active.pdf|}} | | 10 | Dec 8 | David Fiedler | Vehicle speed prediction based on road parameters | {{:courses:xp36vpd:fiedlervpd.pdf|}} | | 11 | Dec 15 | Jan Skácel | Black sheep detection using remote sensing of vehicle emissions | {{:courses:xp36vpd:skacel_black_sheep_dec_2017.pdf|skacel_blacksheep.pdf}} | | 12 | Jan 5 | Vladimír Kunc | Understanding Hinton's Capsule Networks | {{:courses:xp36vpd:xp36vpd_capsnets.pdf|capsnets.pdf}} | | | | Ondřej Hubáček | Gradient boosted trees | {{:courses:xp36vpd:gradient_boosted_trees.pdf|gbtrees.pdf}} | | 13 | Jan 12 | Jan Pichl | StarSpace: Embed All The Things! | {{:courses:xp36vpd:vpdstarspace.pdf|}} | | | | Magda Friedjungová | Big data mining -- scalable algorithms, clustering, social data | cancelled | | 14 | Jan 19 | Jakub Repický | Bayesian hypotheses testing | {{:courses:xp36vpd:vpdd2018bayesian.pdf|}} | | | | David Fiedler | kNN -- local weighting, efficiency in high-dimensional spaces and large datasets | {{:courses:xp36vpd:vpdknn.pdf|}} | | 15 | Jan 26 | Jan Skácel | Fighting fake news | {{:courses:xp36vpd:skacel_fake_news_draft.pdf| vpdFakeNews.pdf}} | | | | JK, FZ | **zkouška** | | ===== References ===== * Rajaraman, A., Leskovec, J., Ullman, J. D.: [[http://www.mmds.org/|Mining of Massive Datasets]], Cambridge University Press, 2011. * [[http://bigdata-madesimple.com/27-free-data-mining-books/|Free Data mining Books]] * Recent tutorials, major ML/DM conferences: [[http://www.kdd.org/kdd2014/tutorials.html|KDD2014]], [[http://ds2014.ijs.si/index.php?page=invited|DS2014]], [[http://icml.cc/2016/?page_id=97|ICML16]], [[http://ecmlpkdd2016.org/program.html|ECML16]], [[https://nips.cc/Conferences/2016/Schedule|NIPS16]] * Review papers: [[http://www.cs.uvm.edu/~icdm/10Problems/10Problems-06.pdf|Yang, Wu: 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH]], [[http://www.realtechsupport.org/UB/CM/algorithms/Wu_10Algorithms_2008.pdf| Wu et al.: Top 10 algorithms in data mining]] * External seminars: [[http://ai.ms.mff.cuni.cz/~sui/|ML seminars at MFF]], [[http://praguecomputerscience.cz/|PIS]], [[ http://www.mlmu.cz/program/|Machine Learning Meetups]], [[https://keg.vse.cz/seminars.php|FIS KEG]]. ===== Links ===== * Lecturers: [[http://ida.felk.cvut.cz/klema/|Jiří Kléma]], [[http://ida.felk.cvut.cz/zelezny/|Filip Železný]] * [[https://www.fel.cvut.cz/cz/education/rozvrhy-ng.B171/public/cz/predmety/29/68/p2968906.html|Class schedule]]. * [[https://www.fel.cvut.cz/cz/education/bk/predmety/29/68/p2968906|Course syllabus]]. ===== Evaluation, requirements ===== * every student must give his talks (the principle requirement in this type of course), * attendance and active discussion at presentations of other students, * pass the exam, i.e., prove the knowledge of basic concepts presented during the course.