Warning
This page is located in archive.

Spam filter

This assignment will be probably introduced during week 4 or 5. Spam filtering is a very practical assignment with large real world application. It is also a representative of a certain class of problems, we have to contend with in machine learning.

What will you learn?

  • You will see the basic principles of spam filtering in action.
  • You will learn (informally) what the data mining is.
  • You will see how Python can be employed for a machine processing of textual information.
  • You will have another opportunity to practice Python.

Objectives

On this assignment we want to show the following:

  1. For some problems, the program's ability to adapt is essential.
  2. Automatic learning also has certain pitfalls that need to be avoided.
  3. There exists a kind of tasks, where it is hard to judge the quality of a solution.

The problem

In this assignment, your main task is not to create a perfect spam filter. You do not know the methods that would allow you to do that yet. Your task is:

  • To understand the problem, analyze the assignment a decompose it.
  • To create a set of functions and objects in Python, which would help you to use a spam filter (once you create one) and evaluate its quality (compare two spam filters).
  • To create a simple (even a very trivial) spam filter, which could be used in such a framework.

Data

You are given two sets of data to work with. While the final evaluation of your work will be done using different set of data, your spam filter should work on both.

[n/a: Access denied]
courses/ae4b99rph/labs/spam/start.txt · Last modified: 2013/10/24 11:14 by xposik