Search
Image retrieval, mean average precision, training embeddings (representations) with triplet loss.
Use the provided template. The package contains:
tools.py
triplet.py
view.ipynb
./models/*
In this lab we want to use a neural network to produce an embedding of images $x$ as feature vectors $f \in \mathbb{R}^d$. These feature vectors can be used then to retrieve similar images by finding the closest neighbors in the embedding space.
We will use the MNIST dataset and start with a network trained for classification. We will use its last hidden layer representation as the embedding. The network is a small convolutional network defined as follows
class ConvNet(nn.Sequential): def __init__(self, num_classes: int = 10) -> None: layers = [] layers += [nn.Conv2d(1, 32, kernel_size=3)] layers += [nn.ReLU(inplace=True)] layers += [nn.MaxPool2d(kernel_size=2, stride=2)] layers += [nn.Conv2d(32, 32, kernel_size=3)] layers += [nn.ReLU(inplace=True)] layers += [nn.MaxPool2d(kernel_size=2, stride=2)] layers += [nn.Conv2d(32, 64, kernel_size=3)] layers += [nn.ReLU(inplace=True)] layers += [nn.AdaptiveAvgPool2d((2, 2))] layers += [nn.Flatten()] layers += [nn.Linear(64 * 2 * 2, num_classes)] super().__init__(*layers) self.layers = layers def features(self, x): f = nn.Sequential(*self.layers[:-1]).forward(x) f = nn.functional.normalize(f, p=2, dim=1) return f
features(x)
evaluate_AP
evaluate_mAP(net, dataset)
Report: method used to compute all Euclidean distances, figure with retrieved images from 1, precision-recall curve and mAP.
In this part we will learn the embedding to directly facilitate the retrieval performance by training with triplet loss. We will use exactly the same network architecture as in Part 1 but train it differently. For an anchor (query) image $a$ let the positive example $p$ be of the same class and a negative example $n$ be of different class than $a$. We want each positive example to be closer to the anchor than all negative examples with a margin $\alpha \geq 0$: $$ d(f(a),f(p)) \leq d(f(a),f(n)) - \alpha \ \ \ \forall p,n.$$ The constraint is violated if $d(f(a),f(p)) - d(f(a),f(n)) + \alpha \geq 0$. Accordingly we define the triplet loss as the total violation of these constraints: $$ l(a) = \sum_{p,n} \max(d(f(a),f(p)) - d(f(a),f(n)) + \alpha, 0).$$ As function $d$ use the squared Euclidean distance $d(x,y) = \| x-y\|_2^2$ and for alpha select a value in the range $(0,2)$, for example $0.5$. We need to incorporate this loss in the stochastic optimization with mini-batches. Your task is as follows:
triplet_loss
train_triplets
The reference precision-recall comparison: