To compute a local description invariant to geometric and photometric transformations, the neighborhood of the feature points needs to be normalized to undo the effects of transformations. After elimination of these deformations, the invariant description can be computed.
As we already know, the task of the detector is to repeatedly find the shape of the neighborhood of feature points in such a way that the detected points are co-variant with a desired class of transformations (geometric or photometric). For instance, the basic version of Harris or Hessian detector detects points which are co-variant with translation (if you shift the image, points are shifted too). Hessian in the scale-space or DoG detector adds information about the scale of the region and therefore points are co-variant with similarity transformation up to an unknown rotation (detected points have no orientation assigned). Affine co-variant detectors like MSER, Hessian-Affine or Harris-Affine are co-variant up to an affine transformation. The following text describes how to handle geometric information from the point neighbourhood and how to use it for normalization. Depending on the amount of information that a detector gives about a features, the descriptions can be invariant to translation, similarity, affine or perspective transformation.
Geometric normalization is a process of geometric transformation of feature neighborhood into a canonical coordinate system. Information about geometric transformation will be stored in the form of “frame” – a projection of the canonical coordinate system into the neighborhood of a feature point (or region) in the image. The frame will be represented by a 3×3 matrix A, the same as in the first lab. The transformation matrix A is used to obtain a “patch” – a small feature neighborhood in the canonical coordinate system. All other measurements on this square patch of original image are now invariant to desired geometric transformation.
Original image | Translation | Translation and rotation | Similarity | Affine transformation |
Example of geometric normalization, with different classes of transformation (author of images: Štepán Obdržálek). |
For the construction of a transformation A, we use the geometric information about position and shape of point neighborhood:
Many detectors (all from the last lab) detect points (frames), which are similarity- or affine- covariant up to an unknown rotation (angle $\alpha$). For instance, the position and scale give us a similarity co-variant point up to unknown rotation. Similarly, the matrix of second moments and the centre of gravity give us five constraints for an affine transformation and only the rotation remains unknown. To obtain a fully similarity- or affine-covariant point, we need to define orientation $\alpha$ from a partially geometrically normalized point neighborhood.
The normalization including orientation estimation has to be done in two steps. In the first step, the patch invariant to translation, scale and partial affine transformation is created. On this normalized patch, the dominant gradient is estimated. Both transformation are then used together in the transformation matrix A.
To estimate the orientations of the dominant gradients on the partially normalized regions (using $A_{norm}$), the gradient magnitudes and orientations are estimated for each region and a histogram of these gradients is computed. While we assume a random rotation of the region, it is necessary to compute gradients only in circular neighborhood. This can be done by weighting with a window-function (e.g. Gaussian) or with a condition like $(x-x_{center})^2+(y-y_{center})^2 < (ps/2)^2$, where $ps$ patch edge length. Otherwise the content of the corners would influence the orientation estimate undesirably. The orientations of gradients are weighted by their magnitudes. This means that a “stronger” gradient brings more to the corresponding bin of the histogram. For improved robustness, we can use linear interpolation to vote into the closest neighboring bins. At the end, the histogram is filtered with a 1D Gaussian and the maximum is found. For a more precise localization it is possible to fit a parabola into the maximum neighbourhood and to find the orientation with a precision over 360/(number of bins) degrees.
angle=dom_orientation(img)
for dominant orientation estimation in a normalized patch. The input img is a partially normalized patch of the image (ps x ps matrix of doubles), the output is the angle measured from the x-axis of the image in clock-wise direction (angle for vector (x,y) can be computed with function atan2(y,x)
).
Now we are able to write the functions for geometric normalization of feature point neighborhoods with dominant orientations:
pts=transnorm(img,x,y,s,opt)
, where img is the input image, x,y are row vectors of point coordinates and s is a scalar with the fixed point scale (the same value for all feature points).
pts=simnorm(img,x,y,s,opt)
, where img is the input image, x,y are row vectors of point coordinates and s is a row vector with the scales of all points (see the output of the sshesian function).
pts=affnorm(img,x,y,a11,a12,a21,a22,opt)
, where img is the input image, x,y are row vectors of points coordinates and a11,a12,a21,a22 are row vectors with the partial affine transformation elements.
Input parameter opt is a structure containing normalization parameters:
opt.ps - size of the patch (edge length) opt.ext - size of the neighborhood in the canonical coordinate system
The output pts is a one-dimensional array of structures, containing for each point: coordinates - x
,y
, elements of the 2×2 partial affine transformation submatrix - a11
,a12
,a21
,a22
, and the normalized image patch (ps x ps matrix of type double) - patch
.
Use the hierarchy of these functions at implementation - a simpler function can be used by more complex ones. The resulting patch and the patch for dominant orientation estimation are computed using function affinetr
.
Here's a pseudocode for normalization including orientation estimation:
% ... for each point ... A = create_matrix_A_without_rotation(x(id),y(id),...); % orientation estimation tmp = affinetr(img, A, opt.ps, opt.ext); angle = dom_orientation(tmp); % final A, dominant orientation to angle zero R = rotation_2x2(-angle); A(1:2,1:2) = A(1:2,1:2)*R; % create output pts(id).x=x(id); pts(id).y=y(id); pts(id).a11=A(1,1); ... pts(id).patch=affinetr(img, A, opt.ps, opt.ext);For the sake of clarity only the orientation of the strongest gradient is used for normalization in this task.
The goal of photometric normalization is to suppress the scene changes caused by illumination changes. The easiest way of normalization is to expand the intensity channel into whole intensity range. We can also expand the intensity channel such that(after transformation) the mean intensity is at half of the intensity range and the standard deviation of intensities <latex>\sigma</latex> corresponds to it's range (or better <latex>2\times\sigma</latex>). In case we want to use all three color channels, we can normalize each channel separately.
ptsn=photonorm(pts)
, which for each point in the array of structures pts photometrically normalizes the patch of the image (field patch
) such that the mean intensity value will be 0.5 and the standard deviation will be 0.2. Values outside the range <0,1> will be set to 0 or 1 respectively. Function returns an array of structures ptsn similar to the input array of structures pts, with modified patches in the fields patch
an with the original mean values and standard deviations in fields mean
and std
.
On the geometrically and photometrically invariant image patch from the last step, any low dimensional description can be computed. It is clear that the process of normalization is not perfect and therefore it is desirable to compute a description which is not very sensitive to the residual inaccuracies. The easiest description of the patch is the patch itself. Its disadvantages are high dimensionality and sensitivity to small translations and intensity changes. Therefor we will try better descriptions too.
dct=dctdesc(img,num_coeffs)
computing the DCT transformation for input image patch img and returns num_coeffs coefficients in zig-zag order in variable 'dct'. Normalize the values such that all possible DCT coefficients belong to <0,1>. You can make use of the MATLAB function dct2.
dxdy=ghistodesc(img,num_bins)
computing the histogram of gradients, num_bins is the number of bins per channel, so the resulting dxdy description is a column vector with num_bins2 elements of the histogram (size of the gradient in dx - row, dy - column) joined by columns (matlab operator (:) ). Normalize the elements into range <0,1> (0 - no occurrence, 1 - occurres in all points of the patch).
You are supposed to complete functions transnorm.m, simnorm.m, affnorm.m, photonorm.m, dom_orientation.m, dctdesc.m together with all used non-standard functions you have created.
To test your code, copy desc_test.zip and unpack it to direcotry in MATLAB paths (or put it into the directory with your code) and execute. Compare your results with ours.