====== Project 1a: Spam classification ====== * Updated on 24.03.2021 - JS - added paragraph explaining the use of external libraries. * Updated on 24.03.2020 - JS - changed quality scoring, three points are given for macc >= 0.9 (previously it was macc >= 0.95) * Editted on 14.04.2020 - JS - added {{:courses:ui:tasks:st1_report_checklist.pdf|report checklist}}. The goal of the task is to create a spam filter. * {{:courses:ui:tasks:spam_filter_description.pdf|Task description}} * Something to start with: ''{{:courses:ui:tasks:filter_template.py|filter_template.py}}'' * {{:courses:ui:tasks:spam-data.zip|Data}} * Templates: {{:courses:ui:tasks:b3m33ui-report-word-template.zip|Word}}, {{:courses:ui:tasks:b3m33ui-report-latex-template.zip|LaTeX}} You should submit * Python module ''filter.py'' with the filter of your choice, * report describing what you have done, and * Python modules/scripts demonstrating what you have done. Make sure that your report contains all relevant information {{:courses:ui:tasks:st1_report_checklist.pdf|report checklist}}. External libraries * It is not allowed to use a spam filter from another package (out of the box). But you can, of course, use that spam filter for comparison with your own work. * External libraries can be used (e.g. for preprocessing NLTK, Spacy, Word2Vec, FastText). The report must contain a proper description and reference to that library. In the case that the installation is not straightforward (like pip install nltk) provide a short guide or link on how to install it. The code in ''filter.py'' will be used to assess the quality of your filter, and for the contest of all filters. The report may be in Czech or in English, shall have the form of a scientific article, it should be concise, self-contained, showing everything the author wants to show. **This task is individual**. Teams are not allowed. ** Deadline:** Find the exact date in BRUTE. ** Late policy: ** late solutions will be penalized by 4 points for each started week of delay.