Search
At this week's labs we will implement the AdaBoost algorithm. The AdaBoost algorithm takes a class of “weak” classifiers and boosts their performance by combining them together to a “strong” classifier with adaptive weights assigned to each of the weak classifiers (AdaBoost stands for “adaptive boosting”). This way even relatively simple classifiers can be boosted to practically interesting accuracy levels.
In Adaboost, the classification function $H(x)$ has the form $$H(x) = \mbox{sign}(F(x))$$ $$F(x)=\sum^T_{t=1}\alpha_t h_t(x) $$ where $h_t(x)\;: \mathcal{X} \to \{-1, +1\}$ are the selected weak classifiers and $\alpha_t$ are the mixing weights.
The training algorithm itself is relatively simple:
The AdaBoost Algorithm Input: Training data $\{(x_1,y_1),\dotsc,(x_n,y_n)\}$; $x_i \in \mathcal{X}, y_i \in (-1, 1)$, number of iterations $T$, a set of weak classifiers $\mathcal{H}$
Initialise training data weights: $D_1(i) = 1/n$ for $i = 1,\dots,n$. For $t = 1$ to $T$:
Output: “Strong” classifier $H(x)=\mbox{sign}(F(x)) = \mbox{sign}(\sum^T_{t=1}\alpha_t h_t(x))$
In this assignment, we will use the AdaBoost learning algorithm to train a digit classifier. The input to the algorithm will be a set of 13×13 grayscale images of digits and their corresponding labels (0-9). We will select one of the digits as positive class ($y_i = +1$) and the rest of the images will form the negative part of the training set ($y_i = -1$). The task is then to distinguish the selected digit from the rest of the digits.
We are also free to choose the set of weak classifiers, $\mathcal{H}$. Ideally, we would like them to be discriminative enough for the given task but they do not necessarily need to be “too strong”. In our case, possible weak classifiers include local binary patterns (LBP), we may compute various histogram based features, use some edge-based features,… We may use single threshold decisions, build decision trees or use some form of regression. Note, that this is the strength of the AdaBoost algorithm – you are free to combine various weak classifiers and the algorithm determines which ones are useful for the task. Our dataset is rather small and simple, so to see the effects of weak classifiers combination into a stronger one, we will keep the weak classifiers intentionally rather simple.
Each weak classifier $h_{r,c}(I; \theta, p)\in\mathcal{H}$ is based on intensity value of one pixel and is parametrised by four numbers: its coordinates in the image $I$, row $r$ and column $c$, and by an intensity threshold $\theta\in R$ and a parity flag $p \in \{+1, -1\}$ indicating whether the positive class is above or below the threshold. Each such weak classifier takes the image intensity at position $(r, c)$, compares it with the threshold $\theta$ and returns the class using the parity: $$h_{r,c}(I; \theta, p) = \mbox{sign}(p * (I(r,c) - \theta))$$ Note, that we have 13×13=169 weak classifiers which are further parametrised by the threshold and parity. This gives us large variety of weak classifiers while keeping them rather weak and simple (much stronger weak classifiers could be easily used if we consider the relations between pixels, like the above mentioned LBPs). When searching for the best weak classifier in step 1 of the AdaBoost learning algorithm we are thus searching over $\theta$ and $p$ parameters of each of the 169 “template” classifiers and select the one with the lowest weighted error.
General information for Python development.
To fulfil this assignment, you need to submit these files (all packed in a single .zip file) into the upload system:
.zip
adaboost.ipynb
adaboost.py
adaboost
adaboost_classify
compute_error
error_evolution.png
classification.png
weak_classifiers.png
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.
strong_classifier, wc_errors, upper_bound = adaboost(X, y, num_steps)
strong_classifier
wc_error
upper_bound
find_best_weak(X, y, D)
X = np.array([[1, 1.5, 2.5, 3.0, 3.5, 5, 5.5, 6, 7., 8], [1, 4, 2, 8, 1, 4, 3, 5, 7, 11]]) y = np.array([1, -1, 1, -1, 1, -1, 1, -1, 1, -1]) strong_classifier, wc_error, upper_bound = adaboost(X, y, 3) print('strong_classifier: ', strong_classifier, '\nwc_error: ', wc_error, '\nupper_bound: ', upper_bound) # -> strong_classifier: {'wc': array([{'idx': 1, 'theta': 3.5, 'parity': -1}, # -> {'idx': 1, 'theta': 7.5, 'parity': -1}, # -> {'idx': 0, 'theta': 6.5, 'parity': 1}], dtype=object), # -> 'alpha': array([1.09861229, 0.80471896, 0.80471896])} # -> wc_error: [0.1 0.16666667 0.16666667] # -> upper_bound: [0.6 0.4472136 0.33333333]
num_steps
errors = compute_error(strong_classifier, X, y)
X
X = np.array([[1, 1.5, 2.5, 3.0, 3.5, 5, 5.5, 6, 7., 8], [1, 4, 2, 8, 1, 4, 3, 5, 7, 11]]) y = np.array([1, -1, 1, -1, 1, -1, 1, -1, 1, -1]) strong_classifier, wc_error, upper_bound = adaboost(X, y, 3) trn_errors = compute_error(strong_classifier, X, y) print('trn_error: ', trn_errors) # -> trn_error: [0.1 0.1 0. ]
classify = adaboost_classify(strong_classifier, X)
1. What stragegy does the AdaBoost algorithm use to select the weak classifier in each iteration?
Solution
a) No. That would not take the sample weight D(i) into account. b) No. Definitely not the one that works the worst. c) No. Definitely not the one with the highest (weighted) error. d) Yes.
2. How do the weights change in each iteration?
a) No. High weights are used for hard samples, not the other way around. b) Yes. c) No. The weak classifier weights alpha_t do not change once assigned. d) No. The weak classifier weights alpha_t do not change once assigned.
3. When does the AdaBoost algorithm stop? Select all that apply.
a) No. No such thing happens in the algorithm. b) No. The testing set is not even available during training. c) Yes. At least when implemented this way. d) Yes. e) No. No such test in the algorithm.
4. How the $\alpha$ coefficients change in time?
a) Yes. See slides. b) No. ^^ c) No. ^^
5. Which of the digit classes is the easiest to train (the least number of iterations needed to get zero test error)?
The class 1.
Implement the AdaBoost classifier for the data described in the lecture slides [2]. Each weak classifier is a threshold/parity decision on a radial line from the data centre (the thresholds are actually what is drawn in the slides). Discretise the angles of the directions of the radial lines to a reasonable number (e.g. every 5 degrees).
[1] short support text on AdaBoost [2] AdaBoost lecture