Table of Contents

Spam filter

Spam filtering is a very practical assignment with a large real world application. It also represents certain class of problems, we have to contend with in machine learning.

The problem

In this assignment, your main task is not to create a perfect spam filter. You do not know the methods that would allow you to do that yet. Your task is:

What will you learn?

Objectives

Using this assignment we want to show the following:

  1. For certain problem classes, the program's ability to adapt itself is essential.
  2. Automatic learning also has certain pitfalls that need to be avoided.
  3. There exists a kind of tasks, where it is hard to judge the quality of a solution.

Data

We provide you with 2 sets of data to work with. While the final evaluation of your work will be done using different set of data, your spam filter should work on both. It is also important that you understand the format of the data that we will use; it is described on the page linked above.