Search
In the previous labs we were searching for correspondences between image pairs. A disadvantage of this method is its quadratic time complexity in the number of images. In this lab we'll look at image (or object) retrieval – a common problem in many computer vision applications. The goal is to find images corresponding to our query image in a large image database. A naive method – trying to match all pairs – is too slow for large databases. Moreover the set of similar images can be only a tiny fraction of the whole database. Therefore, faster and more efficient methods have been developed. r
One of the fastest methods for image retrieval is based on the so called bag-of-words model. The basic idea is to represent each image in a database by a set of visual words from a visual vocabulary. The visual word can be imagined as a representative of an often occurring image patch. With a visual vocabulary, images can be represented by visual words similarly to documents in a natural language.
In the previous labs, the images were represented by a set of descriptions of normalized patches. Visual words can be chosen from a space of such descriptions. Each description can be understood as a point in this space. We'll choose the visual words by vector quantization - one visual word for each cluster of descriptors. We will implement one method commonly used for vector quantization (clustering): K-means.
One of the simplest method for vector quantization is k-means.
Repeat steps 2 and 3 until the assignments in step 2 do not change, or until the global change of point-to-mean distances decreases below some threshold.
Your task is to quantize the Paris6K dataset . The Paris Dataset consists of 6412 images collected from Flickr by searching for particular Paris landmarks. By quantizing Paris, the visual words you estimate from K-means will describe the architecture of the city. Remember that the SIFT descriptors were computed from distinctive image patches extracted from the Paris image dataset. So if you look at the image patches associated with SIFT clusters (SIFTs assigned to the same visual word) found by K-means, you might see recurring architectural motifs in the city. As with any learning task, we won't test the learned visual vocabulary (visual words) on the training set, Paris6k . Instead, we'll want to perform our image retrieval tasks in the following labs on the Oxford5k dataset, Oxford5k . This will test how your vocabulary generalizes for a completely independent image set. You might ask, how independent is it? In other words, how much does Paris look like Oxford?
Implement k-means.
label(data,means)
kmeans(data, K)
The data has been prepared for you on the virtual machine. You will use the single precision SIFTs extracted from Paris6K dataset. Provided is a function to postprocess the SIFTs so that you will achieve better retrieval performance. Depending on your memory constraints you will have to implement the nearest neighbor calculation cleverly, so that you don't exhaust RAM.