====== XP33VTP -- Computer Vision – Theory and Practice ===== Meeting time: Tuesdays 16:15 Location: G102A ==== Slides ==== First and second paper: [[https://cmp.felk.cvut.cz/~chum/XP33VTP/ANN-search.pdf|Partitioning vs Embedding]] ===== Kick-off meeting 01 Mar 2022 ===== ===== First meeting 16:15 on 22 Mar ===== === Paper === Marius Muja and David G. Lowe: "Scalable Nearest Neighbor Algorithms for High Dimensional Data". Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 2014. [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/flann_pami2014.pdf|PDF]] [[https://github.com/mariusmuja/flann|Github page]] == Task == * Implement approximate k-means algorithm, use approximate NN instead of exact NN (on the SIFT dataset) * Construct k-NN graph: a directed graph, vertex is a feature vector, from each vertex there are k edges to k nearest data vectors (on the Oxford 5k dataset) Oxford 5k dataset: image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/images.zip|thumbnails]], [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imagedesc.dat|descriptors]] (128 D vectors, single precision, stored one by one), and corresponing image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imagenames.txt|names]] (one name per line, i-th name corresponds to i-th descriptors). The following lines will read the descriptors and the image names in MATLAB: fid = fopen('imagedesc.dat', 'r');\\ X = fread(fid, [128,inf], 'single=>single');\\ fclose(fid);\\ Names = textread('imagenames.txt', '%s'); SIFT dataset: 2M SIFT descriptors are available [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/sift.zip|here]]. The descriptors are 128D unsigned byte precision, the following Matlab lines will read the descriptors: fid = fopen('SIFT.dat', 'r');\\ X = fread(fid, [128,inf], 'uint8=>uint8');\\ fclose(fid);\\ Use the SIFT dataset for the approximate k-means. Use 32k (2^15) cluster centers. Compare three different assignments to the nearest cluster (kd-forest, k-means tree, exact assignmet). For all three cases, start from identical inicialization. Compare the final results (up to 30 iterations of k-means) in terms of the distortion - the sum of squared distances, that is Σ (X - f(X))^2, where f(X) is the assigned cluster center. Ie, you need to run the k-means 3x times, each one for 30 iterations. You are supposed to deliver a short pdf report at least 2 days prior to the meeting. The report should include, but is not limited to: graph of evolution of the distortion over iterations for all methods, parameters used, reasoning for the choice of parameters, observations of what the differences are between the methods, conclusions. Looking forward to results on your own data too. ===== First point five meeting 16:15 on 29 Mar 2022 ===== Fixing the assignment from the first meeting. ===== Second meeting 16:15 on 19 Apr 2022 ===== === Paper === Herve Jegou, Matthijs Douze, Cordelia Schmid: "Product quantization for nearest neighbor search", PAMI 2011. [[http://hal.inria.fr/docs/00/82/50/85/PDF/jegou_pq_postprint.pdf|PDF]] [[http://www.irisa.fr/texmex/people/jegou/ann.php|software page]] == Task == Oxford 105k dataset: image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/oxc-complete.zip|thumbnails]], [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imgdesc105k.dat|descriptors]] (128 D vectors, single precision, stored one by one), and corresponding image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imagenames105k.txt|names]] (one name per line, i-th name corresponds to i-th descriptors, Oxford 5k image names are given without a directory, remaining filenames contain a directory corresponding to the distributed directory structure). Additional descriptors for Oxford105k are available [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/CNNdesc105k.dat|here]]. These descriptors are based on deep convolutional neural networks and should in general give better visual similarity for nearest neighbours than the previous ones, please compare. * For each image (use all 105k images) find k-NN, visually verify the quality - script that for a selected image shows the k neighbours, etc. * Compare the quality (on a sub-set) and running time (on all 105k images) of product quantization and FLANN. Select 1000 images at random and find exact k-NN, for each of the algorithms compute an estimate of its precision. ===== Third meeting 16:15 on 17 May 2022 ===== === Paper === Wei Dong, Moses Charikar, Kai Li : "Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures." In Proceedings of the 20th international conference on World Wide Web (WWW). New York, NY. 2011. [[http://www.cs.princeton.edu/cass/papers/www11.pdf|PDF]] [[http://code.google.com/p/nndes/|software page]] Use the new [[https://github.com/aaalgo/kgraph| KGraph]] implementation. === Task === Create a k-NN graph on the Oxford 105k dataset (CNN descriptors) and compare the precision and the run-time with the previous methods (make sure that you compute NN to all 105k vectors). Perform also a visual inspection of the results (are the nearest images similar?). **Important:** You will need to extract the constructed data structure of the k-NN graph. **Do not use the function provided to search for nearest neighbours!** ===== Fourth meeting 16:15 on 14 June 2022 ===== === Paper === Jeff Johnson, Matthijs Douze, Hervé Jégou: "Billion-scale similarity search with GPUs" 2017 [[https://arxiv.org/abs/1702.08734|arXive]] === Task === Install the code form the githup [[https://github.com/facebookresearch/faiss|repository]]. Compare the implemented methods (using GPU) with the methods evaluated until now. Specifically: use exhaustive GPU implementation in the k-means task and compare timings. Use both, exact and approximate NN for k-NN graph construction. Compare timings and precision. Further, implement an algorithm finding a path from image A to image B in the k-NN graph, so that the longest edge in the path is as short as possible, as in Section 6.6 of the paper, equation (12). Further, among all the paths with the same cost (12), select the shortest one (the smallest sum of edge weights). Have a code to compute the path and display the images along the path ready (I will provide some image pairs A and B).