Table of Contents

12 Classifiers I

Basics about classifiers and teaching them.

Learning outcomes

After this practice session, the student

Program

Exercise I / Solving together

See Wikipedia first:

Use confusion matrices to determine which image classifier is better (safer, or leading to less unnecessary stops of the car). See the first part of pdf.

Exercise II / Solving together

Intuitive intro to linear classification: find linear discriminant functions. See the second part of pdf.

Bonus quiz

Homework

Bonus read

Even in the “age of AI”, Neural Networks, Transformers, etc. more “traditional” and “simple” (e.g. statistical) methods like Naive Bayes and kNN are still used and sometimes have very close or better performance. These simple methods are less impacted by some of the issues with classification tasks, or downright ignore them and do not have to deal with such issues (though they obviously have their limits and downsides as well).

Examples on language models, which are currently popular:

  1. This paper, that shows that kNN with a creative distance estimation choice (with gzip) is close to neural network-based solutions (including Transformers) for sentence classification, and is better in the case of out-of-domain datasets. Without requiring any training, tuning, or parameters. Intuition for the distance: two texts, if similar, when concatenating them, barely increases gzip size due to how compression works.
  2. varDial, a shared task where the goal is to create classifiers that work on languages that are very close (e.g. dialects of the same language), where the training data is often wikipedia, has historically seen the simplest models win at the task.
  3. This paper, trying to classify Perso-Arabic scripts, shows that some variation of Naive Bayes performs about as well as a Multi Layer Perceptron, and significantly better than previous state-of-the art methods.
  4. This paper, using ngram models (no Neural network involved) for language identification, performing better than the State of the Art at the time, while at the same time enabling to add languages to the classifiers post-training without needed to re-train from scratch (which is one of the big issues with Deep Learning).
  5. This work on forecasting methods, showing that statistical methods can be as good as Deep Learning methods, for a fraction of the cost and time (~\$0.5 cents and 6mns of processing vs. ~\$11 000 and 14 days of processing). This was also discussed here, with some interesting comments.

Some issues with classification, in particular models using Deep Neural Networks and similar, include: