Correspondence Problem

Searching for correspondences is a fundamental task in computer vision. The goal is to find corresponding parts (patches) of a scene in two or more images. Finding the correspondences for each pixel is computationally difficult. Therefore, methods based on correspondences of local patches first detects, in each image, the so called local features (a.k.a. interest points, distinguished regions, covariant regions) i.e. regions detectable in other images of the same scene regardless of geometric and photometric transformations. Correspondences are then found between local features.

For each feature the exact position and a description of the feature neighborhood is saved. The description can be a vector of pixel intensities, histogram of intensities, histogram of gradients, or other invariant. The choice of the descriptor influences the discrimination and invariance of the feature. Roughly speaking, local feature descriptor is a vector, which characterises the image function in a small, well localized neighborhood of a feature point.

When establishing correspondences, for each feature in one image we look for features from the second image that have a similar description. We call these correspondences “tentative”, as some of them are usually wrong – they do not associate the same points in the scene. This happens because of noise, repeated structures in the scene, occlusions or inaccuracies of models of geometric and photometric transformation.

As a next step we need a robust algorithm, which (based on the knowledge of the image-to-image transformation model) will choose the largest subset of correspondences which are all consistent with the model. Such correspondences are called „verified“ or „inliers“ (with the model). An algorithm widely used for this purpose is called RANSAC (RANdom Sampling Consensus) and will be introduced it in the last section.

1. Feature detection

Feature detection is the first step in the process. We will implement two often used feature detectors. Detector of Hessian maxima and Harris detector.

Hessian detector

The goal of the detector is to repeatedly find the feature points which are well localized and there is enough information in their neighborhood for creating a good description. Detector of the local extrema of Hessians – determinant of Hessian matrix (matrix of second order derivatives)

<latex>\mathbf{H} = \left[\begin{array}{cc} D_{xx}(x,y;\sigma) & D_{xy}(x,y;\sigma)
D_{xy}(x,y;\sigma) & D_{yy}(x,y;\sigma) \end{array}\right],</latex>

finds centers of the so called blobs, local extrema of the intensity function, which are well localized in a neighborhood given scale <latex>\sigma</latex>. Interest points are strong local maxima with value (response) above a threshold t, which is set according to the level of noise in the image.

Implement a function response=hessian_response(img,sigma), which computes Hessian for each pixel of the given image img according to formula: <latex>\mathrm{det}(\mathbf{H}) = D_{xx} D_{yy} - D_{xy}^2</latex>. For computation of the derivatives <latex>D_{xx},D_{yy},D_{xy}</latex> use the derivative of Gaussian with variance <latex>\sigma^2</latex>.

For the localisation of Hessian extrema we will need a function to finds out if a point is a local extremum. This is done by comparing the pixel value with pixels in its neighborhood.

Implement a function maximg=nonmaxsup2d(response,thresh) for non-maxima suppression. From input image response (matrix of size HxW, where H stands for image height and W for image width) computes a new matrix maximg of the same size, in which all local maxima with response value greater than thresh will have value 1 and non-maxima points or local maxima with response below thresh will have value 0. Function checks for local maxima in 8-neighbourhood of the point.
Connect these two functions into a detector [x,y]=hessian(img,sigma,thresh), which will find Hessian maxima in the image img with scale of gradient sigma. Return only points (x, y) which were thresholded to 1 - you can use function find [y,x]=find(maximg) for finding these points in the matrix. Return vectors x a y as row vectors of the same size, where x(1), y(1) are points with coordinates of the first point, x(2), y(2) coordinates of the second, etc.. Pay attention to indices, which are one-based in MATLAB, while we want to have coordinates (0,0) for the top-left pixel of our image.

Harris detector

Harris detector detects points, in which the gradient changes in two orthogonal directions. For this reason it is sometimes called a corner detector. It uses properties of the so called autocorrelation matrix of gradients <latex>\mathbf{C}(x,y;\sigma_d;\sigma_i)</latex>.

<latex> \mathbf{C}(x,y;\sigma_d,\sigma_i)=G(x,y;\sigma_i)*\left[\begin{array}{cc} D^2_{x}(x,y;\sigma_d) & D_x D_y(x,y;\sigma_d)
D_x D_y(x,y;\sigma_d) & D^2_y(x,y;\sigma_d) \end{array}\right]</latex>

where * is the convolution operator. Using the windowing function <latex>G(x,y;\sigma_i)</latex> matrices of outer products of gradients in the <latex>\sigma_i</latex> neighborhood are accumulated. The matrix of outer product has an eigenvector corresponding to the higher eigenvalue in direction of gradient. By accumulating the outer product using convolution and finding the eigenvalues of auto-correlation matrix we can find two dominant orientations of gradient and its size in a point neighborhood. Harris and Stephens realized this and designed an elegant function.

<latex> R(x,y) = \mathrm{det}(\mathbf{C})-\alpha\mathrm{trace}^2(\mathbf{C}). </latex>

which avoids the computation of eigenvalues <latex>\lambda_1,\lambda_2</latex> of matrix <latex>\mathbf{C}</latex> thanks to the equivalency

<latex> \begin{array}{rcl} \mathrm{det}(\mathbf{C})\!&=&\!\lambda_1\lambda_2
\mathrm{trace}(\mathbf{C})\!&=&\!\lambda_1 + \lambda_2 \end{array} </latex>

to find properties of their ratio.

Obrázek ukazuje průběh funkce R(x,y) pro alpha=0.04, bílé usečky označují izokontury s hodnotami 0, 0.02, 0.04, 0.06, ..., 0.18.

Graph of <latex>$R(x,y)$</latex> for <latex>$\alpha=0.04$</latex>. White curves are isocontours for values <latex>$0, 0.02, 0.04, 0.06, \dots, 0.18$</latex>

The image shows the function <latex>R(x,y)</latex> for <latex>\alpha=0.04</latex>, where white lines shows iso-contours with values 0, 0.02, 0.04, 0.06, …, 0.18. When we want to have values greater than a threshold, we want the smaller of the two eigenvalues to be greater than the threshold. We also want to have the value of the greater eigenvalue greater if the smaller eigenvalue is smaller. In the case, when the both eigenvalues are similar, they can be smaller.

Write a function response=harris_response(img,<latex>\sigma_d</latex>,<latex>\sigma_i</latex>), which computes the response of Harris function for each pixel of input image img. For computation, use the derivative of Gaussian <latex>D_{x},D_{y}</latex> with standard deviation <latex>\sigma_d</latex> and a Gaussian as window function with standard deviation <latex>\sigma_i</latex>.
In the same way as with the detector of Hessian maxima, write a function [x,y]=harris(img,<latex>\sigma_d</latex>,<latex>\sigma_i</latex>,thresh), which detects the maxima of Harris function which are greater than thresh, using standard deviation <latex>\sigma^2_d</latex> for derivatives and <latex>\sigma_i^2</latex> for the window function.

n

Scale-space - automatic scale estimation

The basic version of Harris or Hessian detector needs one parameter (standard deviation), the scale on which it estimates gradients in the image and detects “blobs”. It can be shown that the scale can be estimated automatically for each image. For this purpose, it is necessary to build the “scale-space” – a three dimensional space, in which two dimensions are x,y-coordinates in the image and the third dimension is the scale. The image is filtered to acquire its new versions corresponding to increasing scale. Suppressing the details simulates the situation when we are looking at the scene from greater distance.

With increasing suppression of details the variance of intensities is decreasing. Therefor selection of characteristic scale is necessary to have comparable gradients between two scales. The term “normalized derivative” (considering the “distance” between pixels) was introduced for this purpose to obtain a “scaleless” gradient.

$ \dfrac{\partial f}{\partial \xi} = \dfrac{\partial f}{\partial (x/\sigma)} = \sigma \dfrac{\partial f}{\partial x}\mbox{,}\quad N_x(x,y;\sigma) = \sigma D_x(x,y;\sigma)\mbox{,}\quad N_{xx}(x,y;\sigma) = \sigma^2 D_{xx}(x,y;\sigma)\mbox{,} $

The normalized Hessian matrix is thus: $$ \mathbf{H}_{norm}(x,y,\sigma_i) = \sigma^2 \left[\begin{array}{cc} D_{xx}(x,y;\sigma) & D_{xy}(x,y;\sigma)\\ D_{xy}(x,y;\sigma) & D_{yy}(x,y;\sigma) \end{array}\right]. $$

Lindeberg in his work showed that for such normalized derivatives it is possible to compute the response of differential Laplacian operators

$\mathrm{trace}(\mathbf{H}_{norm}(x,y,\sigma_i))=N_{xx}(x,y;\sigma_i)+N_{yy}(x,y;\sigma_i)$

and Hessians

$\mathrm{det}(\mathbf{H}_{norm}(x,y,\sigma_i)) = N_{xx}(x,y;\sigma_i) N_{yy}(x,y;\sigma_i) - N_{xy}^2(x,y;\sigma_i)$

and use it for automatic learning of characteristic scale of ideal blob. After applying these operators, the local maxima of the image will give us x,y-coordinades and a scale $\sigma$ of the blobs.

The response of differential operators in scale space in the middle of Gaussians.

Image of the Gaussian with standard deviations=8,16,24 and 32. The response det(H_norm) and trace(H_norm) in the middle of Gaussians. Note the shift of the extreme proportional to the size of blob.

Write a function [ss,sigma]=scalespace(img,levels,step), which returns a 3D scale-space (matrix HxWxL, where H is hight, W is width and L is number of levels), where ss(:,:,1) will be input image img, for which we will assume $\sigma_1 = 1$ and ss(:,:,i) will be it's filtered version $S(x,y,\sigma_{i+1})$, where $\sigma_{i+1}=\mathrm{step}\cdot\sigma_i $ for $i \in \{1,\ldots,\mathrm{levels}\}$. Vector sigma will contain values $\sigma_i$.
Write a function maximg=nonmaxsup3d(response, threshold) for non-maxima suppression in a 3D matrix response (i.e. take to consideration all 26 points in the neigborhood). The function returns a matrix of the same size as input matrix response, in which there will be 1 in places of local maxima which are greater than thresh and 0 elsewhere.

Write a function [hes,sigma]=sshessian_response(img), which will use function scalespace and compute Hessian response for normalized derivation of Gaussian. Choose the parameters step and levels of function scalespace wisely (e.g. step=1.1, levels=40). For computation of the response use estimation of the second derivatives which comes from applying of differential filters to the result of function scalespace. Normalize the derivatives with a standard deviation for each order of derivation (as described earlier). The result will be a 3D matrix hes with elements corresponding to normalized response of Hessian in scale-space.
Join functions nonmaxsup3d and sshessian_response to detector of Hessian maxima with automatic scale estimation [x,y,s]=sshessian(img, thresh). Return only points with response greater than thresh. The result will be vectors of the same length for position x, y and scale s of thresholded Hessian maxima in the space-space. Top left point of the image has coordinates (0,0). You can use the following code for localization of points:

% get the matrix of maxima
maximg=nonmaxsup3d(response, thresh);
% find positions
[y x u]=ind2sub(size(maximg), find(maximg)); 
% change coordinates system to zero-based
x=x-1; y=y-1;
% change u to scale s
...

Test your functions on this image:

For visualization of detected points use the function showpts.m:

% read an image
img=imread('sunflowers.png');
% finds Hessian maxima in scale-space, use threshold 0.02
[x,y,s]=sshessian(im2double(rgb2gray(img)), 0.02);
% show result
imshow(img);
p.linewidth=2; p.color='red'; 
showpts([x;y;s], p);

You should obtain an output similar to this:

Optional Task 1: Affine covariance, Baumberg iteration

The detection of similarity-covariant points, as maxima of Hessian in scale space, can be extended to affine-covariant points to an unknown rotation. It is based on the so called Baumberg iteration, the idea is to observe the distribution of gradients in the vicinity of the detected point. Assuming that there is a transformation, which “flattens” the intesity in the point's neighborhood in such a way that the distribution of gradients would be isotropic (with gradients distributed equally in all directions), the transformation can be found using the second moment matrix of gradients, i.e. the autocorrelation matrix of gradients that we already know.

$$ \mathbf{C}(x,y;\sigma_d,\sigma_i)=G(x,y;\sigma_i)*\left[\begin{array}{cc} D^2_{x}(x,y;\sigma_d) & D_x D_y(x,y;\sigma_d)\\ D_x D_y(x,y;\sigma_d) & D^2_y(x,y;\sigma_d) \end{array}\right]$$

As we have said earlier, this matrix reflects the local distribution of gradients, we can show that when we find a transformation $\mathbf{\mu} = \mathbf{C}^{-1/2}$ that translates the coordinate system given by eigenvectors and eigenvalues of matrix $\mathbf{C}$ to a canonical one, the distribution of gradients in the transformed matrix $\mathbf{\mu}$ will be “more isotropic”. However, since we used a circular window function it is possible that for some gradients around, the weight will be calculated wrongly, or not at all. Therefore, it is necessary to repeat this procedure until the eigenvalues of matrix $\mathbf{C}_i$ from the i-th iteration of the point's neighborhood will be close to identity (a multiplication of identity). We'll find out by comparing the ratio of eigenvalue of matrix $\mathbf{C}_i$. The resulting array of local affine deformation is obtained as the total deformation of the neighborhood needed to “become” isotropic:

$$N = \prod_i \mu_i$$

This transformation leads (after the addition of translation and scaling) from the image coordinates to the canonical coordinate system. In practice, we are mostly interested in the inverse transformation $\mathbf{A}=\mathbf{N}^{-1}$.

Write a function [x, y, A11, A12, A21, A22] = affinehessian (img, Threshold), which extends the detector of Hessian maxima from the previous step, and for each maximum in the scale space it calculates one step of the Baumberg iteration. The function returns the position of a point and the 2×2 submatrix of matrix A: $$A=\sigma_i \, inv\left(C^{-1/2}/\sqrt{det(C^{-1/2})})\right)$$

Maximal Stable Extremal Region (MSER)

The detector of Maximal Stable Extremal Regions is based a different idea. Detection is based on growing regions in a binary thresholded image while increasing the threshold for image intensity. While increasing intensity, new regions appear in the image, they are merging together and at the end there is one region with area of the whole image. During the region growing, the region statistics (area and the border length) are monitored. The detector finds such intensity ranges where the ratio of the area and border length changes the least. The MSER detector is implemented as a MEX module. It is used as follows:

  p.min_margin = 10;
  p.min_size   = 30;
  mser = extrema(img, p, [1 2]);

MSERs are affine co-variant regions. It is possible to visualize their covariance matrix as an ellipse with the longer axis corresponding to the maximal variance of elements. We can use following code for conversion into elliptical regions:

  regs=[mser{1}{2,:} mser{2}{2,:}];
  x=[regs.cx]; y=[regs.cy];
  a11=sqrt([regs.sxx]); a12=zeros(size(a11));
  a21=[regs.sxy]./a11;  a22=sqrt([regs.syy] - a21.*a21);
  imshow(img); showpts([x;y;a11;a12;a21;a22]);

For visualisation download showpts.m.

The Dataset

To try your new detectors, download pictures of some objects from different viewpoints with increasing complexity. Pictures should contain a simple rotation, with and without scale change. The other pictures should be taken from different viewpoints. If necessary (due to long runtime of your code) you can shrink the images to half.

What should you complete?

You are supposed to complete functions hessian_response.m, hessian.m, harris_response.m, harris.m, nonmaxsup2d.m, scalespace.m, nonmaxsup3d.m, sshessian_response.m a sshessian.m together with all used non-standard functions you have created. You are supposed to test on the the dataset.

Testing

We use a script and the MATLAB function 'publish' to test your code. Download detect_test.zip and unpack it to a direcotry which is in MATLAB paths (or put it into the directory with your code) and execute. Compare your results with ours.

Table of Contents