Search
To compute a local description invariant to geometric and photometric transformations, the neighborhood of the feature points needs to be normalized to undo the effects of transformations. After elimination of these deformations, the invariant description can be computed.
As we already know, the task of the detector is to repeatedly find the shape of the neighborhood of feature points in such a way that the detected points are co-variant with a desired class of transformations (geometric or photometric). For instance, the basic version of Harris or Hessian detector detects points which are co-variant with translation (if you shift the image, points are shifted too). Hessian in the scale-space or DoG detector adds information about the scale of the region and therefore points are co-variant with similarity transformation up to an unknown rotation (detected points have no orientation assigned). Affine co-variant detectors like MSER, Hessian-Affine or Harris-Affine are co-variant up to an affine transformation. The following text describes how to handle geometric information from the point neighbourhood and how to use it for normalization. Depending on the amount of information that a detector gives about a features, the descriptions can be invariant to translation, similarity, affine or perspective transformation.
Geometric normalization is a process of geometric transformation of feature neighborhood into a canonical coordinate system. Information about geometric transformation will be stored in the form of “frame” – a projection of the canonical coordinate system into the neighborhood of a feature point (or region) in the image. The frame will be represented by a 3×3 matrix A, the same as in the first lab. The transformation matrix A is used to obtain a “patch” – a small feature neighborhood in the canonical coordinate system. All other measurements on this square patch of original image are now invariant to desired geometric transformation.
For the construction of a transformation A, we use the geometric information about position and shape of point neighborhood:
Many detectors (all from the last lab) detect points (frames), which are similarity- or affine- covariant up to an unknown rotation (angle $\alpha$). For instance, the position and scale give us a similarity co-variant point up to unknown rotation. Similarly, the matrix of second moments and the centre of gravity give us five constraints for an affine transformation and only the rotation remains unknown. To obtain a fully similarity- or affine-covariant point, we need to define orientation $\alpha$ from a partially geometrically normalized point neighborhood.
The normalization including orientation estimation has to be done in two steps. In the first step, the patch invariant to translation, scale and partial affine transformation is created. On this normalized patch, the dominant gradient is estimated. Both transformation are then used together in the transformation matrix A.
To estimate the orientations of the dominant gradients on the partially normalized regions (using $A_{norm}$), the gradient magnitudes and orientations are estimated for each region and a histogram of these gradients is computed. While we assume a random rotation of the region, it is necessary to compute gradients only in circular neighborhood. This can be done by weighting with a window-function (e.g. Gaussian) or with a condition like $(x-x_{center})^2+(y-y_{center})^2 < (ps/2)^2$, where $ps$ patch edge length. Otherwise the content of the corners would influence the orientation estimate undesirably. The orientations of gradients are weighted by their magnitudes. This means that a “stronger” gradient brings more to the corresponding bin of the histogram. For improved robustness, we can use linear interpolation to vote into the closest neighboring bins. At the end, the histogram is filtered with a 1D Gaussian and the maximum is found. For a more precise localization it is possible to fit a parabola into the maximum neighbourhood and to find the orientation with a precision over 360/(number of bins) degrees.
angle=dom_orientation(img)
atan2(y,x)
Now we are able to write the functions for geometric normalization of feature point neighborhoods with dominant orientations:
pts=transnorm(img,x,y,s,opt)
pts=simnorm(img,x,y,s,opt)
pts=affnorm(img,x,y,a11,a12,a21,a22,opt)
Input parameter opt is a structure containing normalization parameters:
opt.ps - size of the patch (edge length) opt.ext - size of the neighborhood in the canonical coordinate system
The output pts is a one-dimensional array of structures, containing for each point: coordinates - x,y, elements of the 2×2 partial affine transformation submatrix - a11,a12,a21,a22, and the normalized image patch (ps x ps matrix of type double) - patch. Use the hierarchy of these functions at implementation - a simpler function can be used by more complex ones. The resulting patch and the patch for dominant orientation estimation are computed using function affinetr.
x
y
a11
a12
a21
a22
patch
affinetr
Here's a pseudocode for normalization including orientation estimation:
% ... for each point ... A = create_matrix_A_without_rotation(x(id),y(id),...); % orientation estimation tmp = affinetr(img, A, opt.ps, opt.ext); angle = dom_orientation(tmp); % final A, dominant orientation to angle zero R = rotation_2x2(-angle); A(1:2,1:2) = A(1:2,1:2)*R; % create output pts(id).x=x(id); pts(id).y=y(id); pts(id).a11=A(1,1); ... pts(id).patch=affinetr(img, A, opt.ps, opt.ext);
The goal of photometric normalization is to suppress the scene changes caused by illumination changes. The easiest way of normalization is to expand the intensity channel into whole intensity range. We can also expand the intensity channel such that(after transformation) the mean intensity is at half of the intensity range and the standard deviation of intensities <latex>\sigma</latex> corresponds to it's range (or better <latex>2\times\sigma</latex>). In case we want to use all three color channels, we can normalize each channel separately.
ptsn=photonorm(pts)
mean
std
On the geometrically and photometrically invariant image patch from the last step, any low dimensional description can be computed. It is clear that the process of normalization is not perfect and therefore it is desirable to compute a description which is not very sensitive to the residual inaccuracies. The easiest description of the patch is the patch itself. Its disadvantages are high dimensionality and sensitivity to small translations and intensity changes. Therefor we will try better descriptions too.
dct=dctdesc(img,num_coeffs)
dxdy=ghistodesc(img,num_bins)
You are supposed to complete functions transnorm.m, simnorm.m, affnorm.m, photonorm.m, dom_orientation.m, dctdesc.m together with all used non-standard functions you have created.
To test your code, copy desc_test.zip and unpack it to direcotry in MATLAB paths (or put it into the directory with your code) and execute. Compare your results with ours.