Project 1a: Spam classification

Updated on 24.03.2021 - JS - added paragraph explaining the use of external libraries.
Updated on 24.03.2020 - JS - changed quality scoring, three points are given for macc >= 0.9 (previously it was macc >= 0.95)
Editted on 14.04.2020 - JS - added report checklist.

The goal of the task is to create a spam filter.

Task description
Something to start with: filter_template.py
Data
Templates: Word, LaTeX

You should submit

Python module filter.py with the filter of your choice,
report describing what you have done, and
Python modules/scripts demonstrating what you have done.

Make sure that your report contains all relevant information report checklist.

External libraries

It is not allowed to use a spam filter from another package (out of the box). But you can, of course, use that spam filter for comparison with your own work.
External libraries can be used (e.g. for preprocessing NLTK, Spacy, Word2Vec, FastText). The report must contain a proper description and reference to that library. In the case that the installation is not straightforward (like pip install nltk) provide a short guide or link on how to install it.

The code in filter.py will be used to assess the quality of your filter, and for the contest of all filters. The report may be in Czech or in English, shall have the form of a scientific article, it should be concise, self-contained, showing everything the author wants to show.

This task is individual. Teams are not allowed.

Deadline: Find the exact date in BRUTE.

Late policy: late solutions will be penalized by 4 points for each started week of delay.