Warning
This page is located in archive.

Exercise 7

Program:

  • K-means algorithm
  • Clustering of pixels

Downloads:

  • Some pictures to play with:

, , , ,

Likely, you will not manage to finish the implementation of all the functions during the excercise in the lab. Finish them as a home work.

K-means

Today we will practice the clustering using the well-known k-means algorithm.

  • Revise the basic principles of the algorithm.
  • Study the help for kmeans function of the MATLAB Statistics toolbox. IMPORTANT: K-means algorithm is very popular and as such it is implemented in virtually all statistical or machine learning libraries. We have already used the NETLAB toolbox and the STPR toolbox. Both have a function with the name kmeans. In this exercise we will work with the original MATLAB function - remove the paths to the toolboxes from the MATLAB path if needed.

Answer the following questions:

  • What is the data format required by the kmeans function???
  • After you get the centroid positions from the kmeans function, what function can you use to assign new data points to the individual centroids?

Helper functions

In this exercise, we shall cluster pixels of various pictures. Each pixel is described by

  • the color (given by the RGB triple) and
  • the position in the picture (given by a pair of the x and y coordinates).

Create 2 helper functions that will facilitate easy transformation of the image into a dataset that can be processed by kmeans and back:

function [pts, nx, ny] = image2dataset(img)
function img = dataset2image(pts, nx, ny)

img: 3D matrix representing a picture loaded by imread function. It is [ny x nx x 3] matrix containing ny rows of nx pixels each with 3 values (R,G,B) for each pixel.

pts: 2D dataset representing the picture in a tabular form. It is [(ny*nx) x 5] matrix containing the tuple (r,g,b,x,y) for each pixel.

nx and ny: the picture size

Experiments

Create a MATLAB script that would allow you to perform the following experiments easily.

Quantization of color levels

Let's perform lossy image compression in such a way that we limit the number of colors displayed in the picture. In other words, let the kmeans algorithm find such k colors which would be most useful for representing the true color image.

  • Compute the file size of uncompressed original image.
  • Compute the file size of uncompressed image with the number of colors limited to (e.g.) k = 16.
  • Perform clustering based only on the colors of the pixels.
  • Replace the colors of all pixels in respective clusters by the colors of the cluster centroids.
  • Display the original and the quantized image. (See imshow function.)

Clustering of coordinates

Repeat the analysis, but cluster only the x and y coordinates and ignore colors. The centroids do not contain definition of the centroid color this time. Use the average color of all the pixels to express the color of a cluster.

Image segmentation

Now, apply the clustering algorithm to all 5 coordinate. You should arrive at a result lying somewhere between the above two. The clusters are now formed by pixels lying close to each other and having a similar color. Usually, such features are expressed by pixels that belong to the same object depicted in the picture. Try to tune the clustering process so that it gives a good segmentation of your chosen image.

courses/y33aui/cviceni/cviceni07.txt · Last modified: 2013/10/04 13:02 (external edit)