====== 2012 XEP33SAM -- Understan​ding State of the Art Methods, Algorithms​, and Implementa​tions ===== Tuesdays 16:15 ===== First meeting 29/2/2012 ===== === Paper === Marius Muja and David G. Lowe: "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", in International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009 [[http://people.cs.ubc.ca/~mariusm/uploads/FLANN/flann_visapp09.pdf|PDF]] [[http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN|software page]] == Task == * Implement approximate k-means algorithm, use approximate NN instead of exact NN * Construct k-NN graph: a directed graph, vertex is a feature vector, from each vertex there are k edges to k nearest data vectors Oxford 5k dataset: image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/images.zip|thumbnails]], [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imagedesc.dat|descriptors]] (128 D vectors, single precision, stored one by one), and corresponing image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imagenames.txt|names]] (one name per line, i-th name corresponds to i-th descriptors). The following lines will read the descriptors and the image names in MATLAB: fid = fopen('imagedesc.dat', 'r');\\ X = fread(fid, [128,inf], 'single=>single');\\ fclose(fid);\\ Names = textread('imagenames.txt', '%s'); SIFT dataset: 2M SIFT descriptors are available [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/sift.zip|here]]. The descriptors are 128D unsigned byte precision, the following Matlab lines will read the descriptors: fid = fopen('SIFT.dat', 'r');\\ X = fread(fid, [128,inf], 'uint8=>uint8');\\ fclose(fid);\\ Use the SIFT dataset for the approximate k-means. Use 32k cluster centers. Compare three different assignments to the nearest cluster (kd-forest, k-means tree, exact assignmet). For all three cases, start from identical inicialization. Compare the final results (after say 15 iterations) in terms of sum of squared distances, that is Σ (X - f(X))^2, where f(X) is the assigned cluster center. Looking forward to results on your own data too. ===== Second meeting 20/3/2012 ===== === Paper === Wei Dong, Moses Charikar, Kai Li : "Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures." In Proceedings of the 20th international conference on World Wide Web (WWW). New York, NY. 2011. [[http://www.cs.princeton.edu/cass/papers/www11.pdf|PDF]] [[http://code.google.com/p/nndes/|software page]] == Task == Oxford 105k dataset: image thumbnails were distributed during the previous meeting,[[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imgdesc105k.dat|descriptors]] (128 D vectors, single precision, stored one by one), and corresponing image [[http://cmp.felk.cvut.cz/~chum/XEP33SAM/imagenames105k.txt|names]] (one name per line, i-th name corresponds to i-th descriptors, Oxford 5k image names are given without a directory, remaining filenames contain a directory corresponding to the distributed directory structure). * For each image find k-NN, visually verify the quality - script that for a selected image shows the k neighbours, etc. * Compare the quality and running time of Dong's algorithm to FLANN (by taking each image in turn as a query) ===== Third meeting 10/4/2012 ===== === Paper === Herve Jegou, Matthijs Douze, Cordelia Schmid: "Product quantization for nearest neighbor search", PAMI 2011. [[http://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf|PDF]] [[http://www.irisa.fr/texmex/people/jegou/ann.php|software page]] (Do not get confused by the text on the page. The mex version in in the package.) == Task == Use the Oxford 105k dataset as in previous task. * For each image find k-NN * Compare the quality and running time of product quantization, Dong's algorithm, and FLANN. Select 1000 images at random and find exact k-NN, for each of the algorithms compute an estimate of its precision. ===== Fourth meeting 10/5/2012 ===== Eventhough 10/5 is Thursday, the schedule is as in Tuesday (to compensate for 1/5 and 8/5). === Paper === Thomas Hofmann: "Probabilistic Latent Semantic Analysis", Proceedings of the International Joint Conference in Artificial Intelligence, 1999 [[http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf|PDF]],[[http://lear.inrialpes.fr/~verbeek/code/plsa.tar.gz|toolbox]] Slides from tutorial at ICCV 2005 are available [[http://people.csail.mit.edu/torralba/shortCourseRLOC/slides/part_1.ppt|here]], as well as a demo [[http://people.csail.mit.edu/fergus/iccv2005/bagwords.html|code]]. === Task === Study carefully the ICCV demo. Apply the pLSA to the following dataset (6164 images, visual vocabulary 50k) for different numbers of topics. Find clusters of images generated by the same topic(s). Display the images as well as the features from the topic(s). Download [[http://ptak/personal/perdom1/mpv/2011/mpvimgs.zip|images]] (2GB !!), [[http://ptak/personal/perdom1/mpv/2011/mpvdb_haff2.mat|data]] (VW: array of cells with the lists of visual words for each image, array of cells with geometries in variable GEOM - first row x, second row y, and names of the images in the variable NAMES, original image sizes in SIZES). ===== Fifth meeting 5/6/2012 ===== ===== Exam ===== Exam date: 7.8.2012 (Tuesday) The evaluation of XEP33SAM consists of 30% an oral exam, 30% from the labs assignments, and 40% the activity during the reading groups. The oral exam will take approximately 1 hour. Prior to registering for the exam prepare a zip file with all the tasks, one directory per task, having all necessary source codes in them. I should be able to re-run the tasks using scripts task_number provided by you (i.e. task_1, eventually task_1a, task_1b if there are subtasks). Make sure that during the exam you will be able to compute / show your results of *any* of the tasks. Either bring your laptop, or make sure you can present your results remotely from a CMP machine (please contact the system administrator if necessary). Important: it is your responsibility to resolve any presentation issues prior to the exam. Not being able to present results during the exam will be considered as an unfinished task (no excuses accepted)!