====== Exercise 7 ====== Program: * K-means algorithm * Clustering of pixels Downloads: * Some pictures to play with: {{:courses:y33aui:cviceni:lekniny.jpg?50|}}, {{:courses:y33aui:cviceni:modrevrcholky.jpg?50|}}, {{:courses:y33aui:cviceni:obklady.jpg?50|}}, {{:courses:y33aui:cviceni:zapadslunce.jpg?50|}}, {{:courses:y33aui:cviceni:zima.jpg?50|}} **Likely, you will not manage to finish the implementation of all the functions during the excercise in the lab. Finish them as a home work.** ===== K-means ===== Today we will practice the clustering using the well-known //k-means// algorithm. * Revise the basic principles of the algorithm. * Study the help for ''kmeans'' function of the MATLAB Statistics toolbox. **IMPORTANT**: K-means algorithm is very popular and as such it is implemented in virtually all statistical or machine learning libraries. We have already used the NETLAB toolbox and the STPR toolbox. Both have a function with the name ''kmeans''. In this exercise we will work with the original MATLAB function - remove the paths to the toolboxes from the MATLAB path if needed. Answer the following questions: * What is the data format required by the ''kmeans'' function??? * After you get the centroid positions from the ''kmeans'' function, what function can you use to assign new data points to the individual centroids? ===== Helper functions ===== In this exercise, we shall cluster pixels of various pictures. Each pixel is described by * the color (given by the RGB triple) and * the position in the picture (given by a pair of the //x// and //y// coordinates). Create 2 helper functions that will facilitate easy transformation of the image into a dataset that can be processed by ''kmeans'' and back: function [pts, nx, ny] = image2dataset(img) function img = dataset2image(pts, nx, ny) **''img''**: 3D matrix representing a picture loaded by ''imread'' function. It is [//ny// x //nx// x 3] matrix containing //ny// rows of //nx// pixels each with 3 values (R,G,B) for each pixel. **''pts''**: 2D dataset representing the picture in a tabular form. It is [(//ny//*//nx//) x 5] matrix containing the tuple (r,g,b,x,y) for each pixel. **''nx''** and **''ny''**: the picture size ===== Experiments ===== Create a MATLAB script that would allow you to perform the following experiments easily. ==== Quantization of color levels ==== Let's perform lossy image compression in such a way that we limit the number of colors displayed in the picture. In other words, let the ''kmeans'' algorithm find such ''k'' colors which would be most useful for representing the true color image. * Compute the file size of uncompressed original image. * Compute the file size of uncompressed image with the number of colors limited to (e.g.) k = 16. * Perform clustering based only on the colors of the pixels. * Replace the colors of all pixels in respective clusters by the colors of the cluster centroids. * Display the original and the quantized image. (See ''imshow'' function.) ==== Clustering of coordinates ==== Repeat the analysis, but cluster only the //x// and //y// coordinates and ignore colors. The centroids do not contain definition of the centroid color this time. Use the average color of all the pixels to express the color of a cluster. ==== Image segmentation ==== Now, apply the clustering algorithm to all 5 coordinate. You should arrive at a result lying somewhere between the above two. The clusters are now formed by pixels lying close to each other and having a similar color. Usually, such features are expressed by pixels that belong to the same object depicted in the picture. Try to tune the clustering process so that it gives a good segmentation of your chosen image.