Search
Quick links: Schedule | Forum | BRUTE | Lectures | Labs
Image retrieval, mean average precision, training embeddings (representations) with triplet loss.
Use the provided template. The package contains:
tools.py
triplet.py
view.ipynb
./models/*
In this lab we want to use a neural network to produce an embedding of images $x$ as feature vectors $f \in \mathbb{R}^d$. These feature vectors can be used then to retrieve similar images by finding the closest neighbors in the embedding space.
We will use the MNIST dataset and start with a network trained for classification. We will use its last hidden layer representation as the embedding. The network is a small convolutional network defined as follows
class ConvNet(nn.Sequential): def __init__(self, num_classes: int = 10) -> None: layers = [] layers += [nn.Conv2d(1, 32, kernel_size=3)] layers += [nn.ReLU(inplace=True)] layers += [nn.MaxPool2d(kernel_size=2, stride=2)] layers += [nn.Conv2d(32, 32, kernel_size=3)] layers += [nn.ReLU(inplace=True)] layers += [nn.MaxPool2d(kernel_size=2, stride=2)] layers += [nn.Conv2d(32, 64, kernel_size=3)] layers += [nn.ReLU(inplace=True)] layers += [nn.AdaptiveAvgPool2d((2, 2))] layers += [nn.Flatten()] layers += [nn.Linear(64 * 2 * 2, num_classes)] super().__init__(*layers) self.layers = layers def features(self, x): f = nn.Sequential(*self.layers[:-1]).forward(x) f = nn.functional.normalize(f, p=2, dim=1) return f
features(x)
evaluate_AP
evaluate_mAP(net, dataset)
Report and discuss: method used to compute all Euclidean distances, figure with retrieved images from 1, precision-recall curve and mAP.
In this part we will learn the embedding to directly facilitate the retrieval performance by training with triplet loss. We will use exactly the same network architecture as in Part 1 but train it differently. For an anchor (query) image $a$ let the positive example $p$ be of the same class and a negative example $n$ be of different class than $a$. We want each positive example to be closer to the anchor than all negative examples with a margin $\alpha \geq 0$: $$ d(f(a),f(p)) \leq d(f(a),f(n)) - \alpha \ \ \ \forall p,n.$$ The constraint is violated if $d(f(a),f(p)) - d(f(a),f(n)) + \alpha \geq 0$. Accordingly we define the triplet loss as the total violation of these constraints: $$ l(a) = \sum_{p,n} \max(d(f(a),f(p)) - d(f(a),f(n)) + \alpha, 0).$$ As function $d$ use the squared Euclidean distance $d(x,y) = \| x-y\|_2^2$ and for alpha select a value in the range $(0,2)$, for example $0.5$. We need to incorporate this loss in the stochastic optimization with mini-batches. Your task is as follows:
triplet_loss
train_triplets
Report and discuss: Optimization settings, plot of the training progress using the saved history. Evaluation of the trained network as in Part 1 (examples of retrieval and mAP score). t-SNE comparison. Precision-recall curve in comparison with that one for the classification-trained network (display in the same plot). The reference solution has ${\rm mAP=0.98}$ and the following retrieved images for the exemplar queries: The reference precision-recall comparison: