12 Classifiers I

12 Classifiers I

Basics about classifiers and teaching them.

Learning outcomes

After this practice session, the student

understands the confusion matrix and what TP, FP, TN, FN mean;
can compute statistical measures like TP rate, FP rate, precision, recall, etc., from the confusion matrix;
is able to choose the right classifier for a particular goal).

Program

Q/A
Discussion of the bonus quiz from the last week
Exercise 1: Evaluating pedestrian detectors for an autonomous cas
Introduction of the bonus quiz for this week

Exercise I / Solving together

See Wikipedia first:

Use confusion matrices to determine which image classifier is better (safer, or leading to less unnecessary stops of the car). See the first part of pdf.

Exercise II / Solving together

Intuitive intro to linear classification: find linear discriminant functions. See the second part of pdf.

Bonus quiz

Linear Classification: find linear equations' values for separating classes on a 2D plane.
0.5 points
submit your solution to BRUTE lab12quiz by May 13, midnight
format: text file, photo of your solution on paper, pdf - what is convenient for you
solution will be discussed on the next lab
Students with their family name starting from A to K (included) have to solve and upload subject A , while students with family name from L to Z have to solve and upload subject B.

Homework

Work on the machine learning assignment.

Bonus read

Even in the “age of AI”, Neural Networks, Transformers, etc. more “traditional” and “simple” (e.g. statistical) methods like Naive Bayes and kNN are still used and sometimes have very close or better performance. These simple methods are less impacted by some of the issues with classification tasks, or downright ignore them and do not have to deal with such issues (though they obviously have their limits and downsides as well).

Examples on language models, which are currently popular:

This paper, that shows that kNN with a creative distance estimation choice (with gzip) is close to neural network-based solutions (including Transformers) for sentence classification, and is better in the case of out-of-domain datasets. Without requiring any training, tuning, or parameters. Intuition for the distance: two texts, if similar, when concatenating them, barely increases gzip size due to how compression works.
varDial, a shared task where the goal is to create classifiers that work on languages that are very close (e.g. dialects of the same language), where the training data is often wikipedia, has historically seen the simplest models win at the task.
This paper, trying to classify Perso-Arabic scripts, shows that some variation of Naive Bayes performs about as well as a Multi Layer Perceptron, and significantly better than previous state-of-the art methods.
This paper, using ngram models (no Neural network involved) for language identification, performing better than the State of the Art at the time, while at the same time enabling to add languages to the classifiers post-training without needed to re-train from scratch (which is one of the big issues with Deep Learning).
This work on forecasting methods, showing that statistical methods can be as good as Deep Learning methods, for a fraction of the cost and time (~\$0.5 cents and 6mns of processing vs. ~\$11 000 and 14 days of processing). This was also discussed here, with some interesting comments.

Some issues with classification, in particular models using Deep Neural Networks and similar, include:

Usually, the bigger the models, the more data they need (though other variables have to be taken into account, such as the quality and variability of the data). Depending on the application, there might not be enough data to learn and let the models converge.
The data might need to be labelled, which is not always available or doable, especially at the scale of the data needed (e.g. tens of thousand of images, or languages that are not much represented on the internet).
Black box: it is difficult to really have an idea of what is going on in details inside the model. There are ways to analyse it, but considering the tendency to have ever bigger and bigger models with millions of parameters, this is extremely difficult to do at such scales.
Usually, the models are EXPENSIVE. Both in terms of (labelled?) data needed, which in itself can be expensive to generate, but as well in terms of infrastructure, facilities, computing power and time, man-hours of preparation and fine-tuning, etc.
This often is also linked with pollution, and extensive use of ressources, for mere 0.01%s of improvements at specific tasks, which often have no real application value.
Scalability issues come in many forms. Modification to the classes usually leads to a full re-training of the model (you recognize dogs and cats, now you want to add giraffes, so you have to re-train from scratch with a new dataset that includes giraffes, and that might need different fine-tuning). This might also include needing a bigger facility and increasing computing power, amplifying related problems.
Reproducibility: Because only those that can have a lot of computing power can use computationally-heavy models, it is actually difficult, if not outright impossible, to reproduce the results to validate the claims of the authors, AND it means there'll be little actual impact as this will not be able to be used in real life applications, unless the authors provide a way to access the trained model (e.g. ChatGPT website and APIs).

Table of Contents