Search
Searching for correspondences is a fundamental task in computer vision. The goal is to find corresponding parts (patches) of a scene in two or more images. Finding the correspondences for each pixel is computationally difficult. Therefore, methods based on correspondences of local patches first detects, in each image, the so called local features (a.k.a. interest points, distinguished regions, covariant regions) i.e. regions detectable in other images of the same scene regardless of geometric and photometric transformations. Correspondences are then found between local features.
For each feature the exact position and a description of the feature neighborhood is saved. The description can be a vector of pixel intensities, histogram of intensities, histogram of gradients, or other invariant. The choice of the descriptor influences the discrimination and invariance of the feature. Roughly speaking, local feature descriptor is a vector, which characterises the image function in a small, well localized neighborhood of a feature point.
When establishing correspondences, for each feature in one image we look for features from the second image that have a similar description. We call these correspondences “tentative”, as some of them are usually wrong – they do not associate the same points in the scene. This happens because of noise, repeated structures in the scene, occlusions or inaccuracies of models of geometric and photometric transformation.
As a next step we need a robust algorithm, which (based on the knowledge of the image-to-image transformation model) will choose the largest subset of correspondences which are all consistent with the model. Such correspondences are called „verified“ or „inliers“ (with the model). An algorithm widely used for this purpose is called RANSAC (RANdom Sampling Consensus) and will be introduced it in the last section.
Feature detection is the first step in the process. We will implement two often used feature detectors. Detector of Hessian maxima and Harris detector.
The goal of the detector is to repeatedly find the feature points which are well localized and there is enough information in their neighborhood for creating a good description. Detector of the local extrema of Hessians – determinant of Hessian matrix (matrix of second order derivatives)
<latex>\mathbf{H} = \left[\begin{array}{cc} D_{xx}(x,y;\sigma) & D_{xy}(x,y;\sigma) D_{xy}(x,y;\sigma) & D_{yy}(x,y;\sigma) \end{array}\right],</latex>
finds centers of the so called blobs, local extrema of the intensity function, which are well localized in a neighborhood given scale <latex>\sigma</latex>. Interest points are strong local maxima with value (response) above a threshold t, which is set according to the level of noise in the image.
response=hessian_response(img,sigma)
For the localisation of Hessian extrema we will need a function to finds out if a point is a local extremum. This is done by comparing the pixel value with pixels in its neighborhood.
maximg=nonmaxsup2d(response,thresh)
[x,y]=hessian(img,sigma,thresh)
[y,x]=find(maximg)
Harris detector detects points, in which the gradient changes in two orthogonal directions. For this reason it is sometimes called a corner detector. It uses properties of the so called autocorrelation matrix of gradients <latex>\mathbf{C}(x,y;\sigma_d;\sigma_i)</latex>.
<latex> \mathbf{C}(x,y;\sigma_d,\sigma_i)=G(x,y;\sigma_i)*\left[\begin{array}{cc} D^2_{x}(x,y;\sigma_d) & D_x D_y(x,y;\sigma_d) D_x D_y(x,y;\sigma_d) & D^2_y(x,y;\sigma_d) \end{array}\right]</latex>
where * is the convolution operator. Using the windowing function <latex>G(x,y;\sigma_i)</latex> matrices of outer products of gradients in the <latex>\sigma_i</latex> neighborhood are accumulated. The matrix of outer product has an eigenvector corresponding to the higher eigenvalue in direction of gradient. By accumulating the outer product using convolution and finding the eigenvalues of auto-correlation matrix we can find two dominant orientations of gradient and its size in a point neighborhood. Harris and Stephens realized this and designed an elegant function.
<latex> R(x,y) = \mathrm{det}(\mathbf{C})-\alpha\mathrm{trace}^2(\mathbf{C}). </latex>
which avoids the computation of eigenvalues <latex>\lambda_1,\lambda_2</latex> of matrix <latex>\mathbf{C}</latex> thanks to the equivalency
<latex> \begin{array}{rcl} \mathrm{det}(\mathbf{C})\!&=&\!\lambda_1\lambda_2 \mathrm{trace}(\mathbf{C})\!&=&\!\lambda_1 + \lambda_2 \end{array} </latex>
to find properties of their ratio.
The image shows the function <latex>R(x,y)</latex> for <latex>\alpha=0.04</latex>, where white lines shows iso-contours with values 0, 0.02, 0.04, 0.06, …, 0.18. When we want to have values greater than a threshold, we want the smaller of the two eigenvalues to be greater than the threshold. We also want to have the value of the greater eigenvalue greater if the smaller eigenvalue is smaller. In the case, when the both eigenvalues are similar, they can be smaller.
Write
response=harris_response(img,
,
)
[x,y]=harris(img,
,thresh)
n
The basic version of Harris or Hessian detector needs one parameter (standard deviation), the scale on which it estimates gradients in the image and detects “blobs”. It can be shown that the scale can be estimated automatically for each image. For this purpose, it is necessary to build the “scale-space” – a three dimensional space, in which two dimensions are x,y-coordinates in the image and the third dimension is the scale. The image is filtered to acquire its new versions corresponding to increasing scale. Suppressing the details simulates the situation when we are looking at the scene from greater distance.
With increasing suppression of details the variance of intensities is decreasing. Therefor selection of characteristic scale is necessary to have comparable gradients between two scales. The term “normalized derivative” (considering the “distance” between pixels) was introduced for this purpose to obtain a “scaleless” gradient.
$ \dfrac{\partial f}{\partial \xi} = \dfrac{\partial f}{\partial (x/\sigma)} = \sigma \dfrac{\partial f}{\partial x}\mbox{,}\quad N_x(x,y;\sigma) = \sigma D_x(x,y;\sigma)\mbox{,}\quad N_{xx}(x,y;\sigma) = \sigma^2 D_{xx}(x,y;\sigma)\mbox{,} $
The normalized Hessian matrix is thus: $$ \mathbf{H}_{norm}(x,y,\sigma_i) = \sigma^2 \left[\begin{array}{cc} D_{xx}(x,y;\sigma) & D_{xy}(x,y;\sigma)\\ D_{xy}(x,y;\sigma) & D_{yy}(x,y;\sigma) \end{array}\right]. $$
Lindeberg in his work showed that for such normalized derivatives it is possible to compute the response of differential Laplacian operators $\mathrm{trace}(\mathbf{H}_{norm}(x,y,\sigma_i))=N_{xx}(x,y;\sigma_i)+N_{yy}(x,y;\sigma_i)$ and Hessians $\mathrm{det}(\mathbf{H}_{norm}(x,y,\sigma_i)) = N_{xx}(x,y;\sigma_i) N_{yy}(x,y;\sigma_i) - N_{xy}^2(x,y;\sigma_i)$ and use it for automatic learning of characteristic scale of ideal blob. After applying these operators, the local maxima of the image will give us x,y-coordinades and a scale $\sigma$ of the blobs.
[ss,sigma]=scalespace(img,levels,step)
ss(:,:,1)
ss(:,:,i)
maximg=nonmaxsup3d(response, threshold)
[hes,sigma]=sshessian_response(img)
scalespace
nonmaxsup3d
sshessian_response
[x,y,s]=sshessian(img, thresh)
% get the matrix of maxima maximg=nonmaxsup3d(response, thresh); % find positions [y x u]=ind2sub(size(maximg), find(maximg)); % change coordinates system to zero-based x=x-1; y=y-1; % change u to scale s ...
Test your functions on this image:
For visualization of detected points use the function showpts.m:
% read an image img=imread('sunflowers.png'); % finds Hessian maxima in scale-space, use threshold 0.02 [x,y,s]=sshessian(im2double(rgb2gray(img)), 0.02); % show result imshow(img); p.linewidth=2; p.color='red'; showpts([x;y;s], p);
You should obtain an output similar to this:
The detection of similarity-covariant points, as maxima of Hessian in scale space, can be extended to affine-covariant points to an unknown rotation. It is based on the so called Baumberg iteration, the idea is to observe the distribution of gradients in the vicinity of the detected point. Assuming that there is a transformation, which “flattens” the intesity in the point's neighborhood in such a way that the distribution of gradients would be isotropic (with gradients distributed equally in all directions), the transformation can be found using the second moment matrix of gradients, i.e. the autocorrelation matrix of gradients that we already know.
$$ \mathbf{C}(x,y;\sigma_d,\sigma_i)=G(x,y;\sigma_i)*\left[\begin{array}{cc} D^2_{x}(x,y;\sigma_d) & D_x D_y(x,y;\sigma_d)\\ D_x D_y(x,y;\sigma_d) & D^2_y(x,y;\sigma_d) \end{array}\right]$$
As we have said earlier, this matrix reflects the local distribution of gradients, we can show that when we find a transformation $\mathbf{\mu} = \mathbf{C}^{-1/2}$ that translates the coordinate system given by eigenvectors and eigenvalues of matrix $\mathbf{C}$ to a canonical one, the distribution of gradients in the transformed matrix $\mathbf{\mu}$ will be “more isotropic”. However, since we used a circular window function it is possible that for some gradients around, the weight will be calculated wrongly, or not at all. Therefore, it is necessary to repeat this procedure until the eigenvalues of matrix $\mathbf{C}_i$ from the i-th iteration of the point's neighborhood will be close to identity (a multiplication of identity). We'll find out by comparing the ratio of eigenvalue of matrix $\mathbf{C}_i$. The resulting array of local affine deformation is obtained as the total deformation of the neighborhood needed to “become” isotropic:
$$N = \prod_i \mu_i$$
This transformation leads (after the addition of translation and scaling) from the image coordinates to the canonical coordinate system. In practice, we are mostly interested in the inverse transformation $\mathbf{A}=\mathbf{N}^{-1}$.
The detector of Maximal Stable Extremal Regions is based a different idea. Detection is based on growing regions in a binary thresholded image while increasing the threshold for image intensity. While increasing intensity, new regions appear in the image, they are merging together and at the end there is one region with area of the whole image. During the region growing, the region statistics (area and the border length) are monitored. The detector finds such intensity ranges where the ratio of the area and border length changes the least. The MSER detector is implemented as a MEX module. It is used as follows:
p.min_margin = 10; p.min_size = 30; mser = extrema(img, p, [1 2]);
regs=[mser{1}{2,:} mser{2}{2,:}]; x=[regs.cx]; y=[regs.cy]; a11=sqrt([regs.sxx]); a12=zeros(size(a11)); a21=[regs.sxy]./a11; a22=sqrt([regs.syy] - a21.*a21); imshow(img); showpts([x;y;a11;a12;a21;a22]);
To try your new detectors, download pictures of some objects from different viewpoints with increasing complexity. Pictures should contain a simple rotation, with and without scale change. The other pictures should be taken from different viewpoints. If necessary (due to long runtime of your code) you can shrink the images to half.
You are supposed to complete functions hessian_response.m, hessian.m, harris_response.m, harris.m, nonmaxsup2d.m, scalespace.m, nonmaxsup3d.m, sshessian_response.m a sshessian.m together with all used non-standard functions you have created. You are supposed to test on the the dataset.
We use a script and the MATLAB function 'publish' to test your code. Download detect_test.zip and unpack it to a direcotry which is in MATLAB paths (or put it into the directory with your code) and execute. Compare your results with ours.