Search
Searching for correspondences is a fundamental task in computer vision. The goal is to find corresponding parts (patches) of a scene in two or more images. Finding correspondences on a pixel level is often ill-posed, and instead correspondences for selected larger image portions (patches) are sought. This comprises of first detecting, in each image, the local features (a.k.a. interest points, distinguished regions, covariant regions) i.e. regions detectable in other images of the same scene with high degree of geometric and photometric invariance. Correspondences are then sought between these local features.
For each feature, the exact position and a description of the feature neighborhood is saved. The description can be a vector of pixel intensities, histogram of intensities, histogram of gradients, etc. The choice of a descriptor influences the discrimination and invariance of the feature. Roughly speaking, local feature descriptor is a vector, which characterises the image function in a small, well localized neighborhood of a feature point.
When establishing correspondences, for each feature in one image we look for features in the second image that have a similar description. We call these correspondences “tentative”, as some of them are usually wrong – they do not associate the same points in the scene. This happens because of noise, repeated structures in the scene, occlusions or inaccuracies of models of geometric and photometric transformation.
As a next step we need a robust algorithm, which (based on the knowledge of the image-to-image transformation model) will choose the largest subset of correspondences which are all consistent with the model. Such correspondences are called „verified“ or „inliers“ (with the model). An algorithm widely used for this purpose is called RANSAC (RANdom Sampling Consensus) and will be introduced it in the last section.
Feature detection is the first step in the process. We will implement two often used feature detectors. Detector of Hessian maxima and Harris detector.
The goal of the detector is to repeatedly find the feature points which are well localized and there is enough information in their neighborhood for creating a good description. Detector of the local extrema of Hessians – determinant of Hessian matrix (matrix of second order derivatives)
$$\mathbf{H} = \left[\begin{array}{cc} D_{xx}(x,y;\sigma) & D_{xy}(x,y;\sigma)\\ D_{xy}(x,y;\sigma) & D_{yy}(x,y;\sigma) \end{array}\right],$$
finds centers of the so called blobs, local extrema of the intensity function, which are well localized in a neighborhood given scale $\sigma$. Interest points are strong local maxima with value (response) above a threshold t, which is set according to the level of noise in the image.
response=hessian_response(img,sigma)
For the localisation of Hessian extrema we will need a function to finds out if a point is a local extremum. This is done by comparing the pixel value with pixels in its neighborhood.
maximg=nonmaxsup2d(response,thresh)
[x,y]=hessian(img,sigma,thresh)
[y,x]=find(maximg)
Harris detector detects locations in which the local autocorrelation function is changing in all directions (therefore, the image gradient direction is changing there). For this reason, it is often called a corner detector. It computes the autocorrelation matrix, $\mathbf{C}(x,y;\sigma_d;\sigma_i)$:
$$ \mathbf{C}(x,y;\sigma_d,\sigma_i)=G(x,y;\sigma_i)*\left[\begin{array}{cc} D^2_{x}(x,y;\sigma_d) & D_x D_y(x,y;\sigma_d)\\ D_x D_y(x,y;\sigma_d) & D^2_y(x,y;\sigma_d) \end{array}\right]$$
where * is the convolution operator. This is weighted averaging (windowing function $G(x,y;\sigma_i)$) of outer products of gradients. If both eigenvalues of $\mathbf{C}(x,y;\sigma_d;\sigma_i)$ are large, and are of comparable value then the pattern inside the windowing function has stable position and we want to use it for matching. Harris and Stephens select such points based on corner response:
$$ R(x,y) = \mathrm{det}(\mathbf{C})-\alpha\mathrm{trace}^2(\mathbf{C}). $$
where the advantage is that eigenvalues $\lambda_1,\lambda_2$ of matrix $\mathbf{C}$ do not have to be explicitly computed because
$$ \begin{array}{rcl} \mathrm{det}(\mathbf{C})\!&=&\!\lambda_1\lambda_2\\ \mathrm{trace}(\mathbf{C})\!&=&\!\lambda_1 + \lambda_2. \end{array} $$ Note however that explicit computation of eigenvalues would not be a problem with todays computers (how would you do it?)
The image shows the function $R(x,y)$ for $\alpha=0.04$, where white lines shows iso-contours with values 0, 0.02, 0.04, 0.06, …, 0.18. When we want to have values greater than a threshold, we want the smaller of the two eigenvalues to be greater than the threshold. We also want to have the value of the greater eigenvalue greater if the smaller eigenvalue is smaller. In the case, when the both eigenvalues are similar, they can be smaller.
Write
response=harris_response(img,
,
)
[x,y]=harris(img,
,thresh)
The basic version of Harris or Hessian detector needs one important parameter: the scale on which it estimates gradients in the image and detects “blobs”. It can be shown that the scale can be estimated automatically for each image. For this purpose, it is necessary to build the “scale-space” – a three dimensional space, in which two dimensions are x,y-coordinates in the image and the third dimension is the scale. The image is filtered to acquire its new versions corresponding to increasing scale. Suppressing the details simulates the situation when we are looking at the scene from greater distance.
With increasing suppression of details the variance of intensities is decreasing. Therefore, selection of characteristic scale is necessary to have comparable gradients between two scales. The term “normalized derivative” (considering the “distance” between pixels) was introduced for this purpose to obtain a “scaleless” gradient.
$$ \dfrac{\partial f}{\partial \xi} = \dfrac{\partial f}{\partial (x/\sigma)} = \sigma \dfrac{\partial f}{\partial x}\mbox{,}\quad N_x(x,y;\sigma) = \sigma D_x(x,y;\sigma)\mbox{,}\quad N_{xx}(x,y;\sigma) = \sigma^2 D_{xx}(x,y;\sigma)\mbox{,} $$
The normalized Hessian matrix is thus: $$ \mathbf{H}_{norm}(x,y,\sigma_i) = \sigma^2 \left[\begin{array}{cc} D_{xx}(x,y;\sigma) & D_{xy}(x,y;\sigma)\\ D_{xy}(x,y;\sigma) & D_{yy}(x,y;\sigma) \end{array}\right]. $$
Lindeberg in his work showed that for such normalized derivatives it is possible to compute the response of differential Laplacian operators $\mathrm{trace}(\mathbf{H}_{norm}(x,y,\sigma_i))=N_{xx}(x,y;\sigma_i)+N_{yy}(x,y;\sigma_i)$ and Hessians $\mathrm{det}(\mathbf{H}_{norm}(x,y,\sigma_i)) = N_{xx}(x,y;\sigma_i) N_{yy}(x,y;\sigma_i) - N_{xy}^2(x,y;\sigma_i)$ and use it for automatic learning of characteristic scale of ideal blob. After applying these operators, the local maxima of the image will give us x,y-coordinades and a scale $\sigma$ of the blobs.
[ss,sigma]=scalespace(img,levels,step)
ss(:,:,1)
ss(:,:,i)
maximg=nonmaxsup3d(response, threshold)
[hes,sigma]=sshessian_response(img)
scalespace
nonmaxsup3d
sshessian_response
[x,y,s]=sshessian(img, thresh)
% get the matrix of maxima maximg=nonmaxsup3d(response, thresh); % find positions [y x u]=ind2sub(size(maximg), find(maximg)); % change coordinates system to zero-based x=x-1; y=y-1; % change u to scale s ...
Test your functions on this image:
For visualization of detected points use the function showpts.m:
% read an image img=imread('sunflowers.png'); % finds Hessian maxima in scale-space, use threshold 0.02 [x,y,s]=sshessian(im2double(rgb2gray(img)), 0.02); % show result imshow(img); p.linewidth=2; p.color='red'; showpts([x;y;s], p);
You should obtain an output similar to this:
(optional study material for interested students)
The detection of similarity-covariant points, as maxima of Hessian in scale space, can be extended to affine-covariant points to an unknown rotation. It is based on the so called Baumberg iteration, the idea is to observe the distribution of gradients in the vicinity of the detected point. Assuming that there is a transformation, which “flattens” the intesity in the point's neighborhood in such a way that the distribution of gradients would be isotropic (with gradients distributed equally in all directions), the transformation can be found using the second moment matrix of gradients, i.e. the autocorrelation matrix of gradients that we already know.
As we have said earlier, this matrix reflects the local distribution of gradients, we can show that when we find a transformation $\mathbf{\mu} = \mathbf{C}^{-1/2}$ that translates the coordinate system given by eigenvectors and eigenvalues of matrix $\mathbf{C}$ to a canonical one, the distribution of gradients in the transformed matrix $\mathbf{\mu}$ will be “more isotropic”. However, since we used a circular window function it is possible that for some gradients around, the weight will be calculated wrongly, or not at all. Therefore, it is necessary to repeat this procedure until the eigenvalues of matrix $\mathbf{C}_i$ from the i-th iteration of the point's neighborhood will be close to identity (a multiplication of identity). We'll find out by comparing the ratio of eigenvalue of matrix $\mathbf{C}_i$. The resulting array of local affine deformation is obtained as the total deformation of the neighborhood needed to “become” isotropic:
$$N = \prod_i \mu_i$$
This transformation leads (after the addition of translation and scaling) from the image coordinates to the canonical coordinate system. In practice, we are mostly interested in the inverse transformation $\mathbf{A}=\mathbf{N}^{-1}$.
The detector of Maximal Stable Extremal Regions is based a different idea. Detection is based on growing regions in a binary thresholded image while increasing the threshold for image intensity. While increasing intensity, new regions appear in the image, they are merging together and at the end there is one region with area of the whole image. During the region growing, the region statistics (area and the border length) are monitored. The detector finds such intensity ranges where the ratio of the area and border length changes the least. The MSER detector is implemented as a MEX module. It is used as follows:
p.min_margin = 10; p.min_size = 30; mser = extrema(img, p, [1 2]);
regs=[mser{1}{2,:} mser{2}{2,:}]; x=[regs.cx]; y=[regs.cy]; a11=sqrt([regs.sxx]); a12=zeros(size(a11)); a21=[regs.sxy]./a11; a22=sqrt([regs.syy] - a21.*a21); imshow(img); showpts([x;y;a11;a12;a21;a22]);
To try your new detectors, take pictures of three objects from five different viewpoints with increasing complexity. Two pictures should contain a simple rotation, with and without scale change. The other pictures should be taken from different viewpoints. Name the images with 1024px on the longer side as obj(0-2)_view(0-4). Pack them to a .zip archive and upload them to upload system (task 02_data). If necessary (due to long runtime of your code) you can shrink the images to half.
You are supposed to upload functions hessian_response.m, hessian.m, harris_response.m, harris.m, nonmaxsup2d.m, scalespace.m, nonmaxsup3d.m, sshessian_response.m a sshessian.m together with all used non-standard functions you have created. Do not forget the dataset.
We use a script and the MATLAB function 'publish' to test your code. Download detect_test.zip and unpack it to a direcotry which is in MATLAB paths (or put it into the directory with your code) and execute. Compare your results with ours.
The task of a detector is to reliably find feature points and their neighborhoods such that the detected points are covariant with a desired class of transformations (geometric or photometric). For instance, the basic version of Harris or Hessian detector detects points which are covariant with translation (if you shift the image, detected points shift the same amount). Hessian in the scale-space or DoG detector adds information about the scale of the region and therefore points are co-variant with similarity transformation up to an unknown rotation (detected points have no orientation assigned). Affine covariant detectors like MSER, Hessian-Affine or Harris-Affine are co-variant with affine transformations. The following text describes how to handle geometric information from the point neighbourhood and how to use it for normalization. Depending on the amount of information that a detector gives about a features, the descriptions can be invariant to translation, similarity, affine or perspective transformation.
Geometric normalization is a process of geometric transformation of feature neighborhood into a canonical coordinate system. Information about geometric transformation will be stored in the form of “frame” – a projection of the canonical coordinate system into the neighborhood of a feature point (or region) in the image. The frame will be represented by a 3×3 matrix A, the same as in the first lab. The transformation matrix A is used to obtain a “patch” – a small feature neighborhood in the canonical coordinate system. All other measurements on this square patch of original image are now invariant to desired geometric transformation.
For the construction of a transformation A, we use the geometric information about position and shape of point neighborhood:
Many detectors (all from the last lab) detect points (frames), which are similarity- or affine- covariant up to an unknown rotation (angle $\alpha$). For instance, the position and scale give us a similarity co-variant point up to unknown rotation. Similarly, the matrix of second moments and the centre of gravity give us five constraints for an affine transformation and only the rotation remains unknown. To obtain a fully similarity- or affine-covariant point, we need to define orientation $\alpha$ from a partially geometrically normalized point neighborhood.
The normalization including orientation estimation has to be done in two steps. In the first step, the patch invariant to translation, scale and partial affine transformation is created. On this normalized patch, the dominant gradient is estimated. Both transformation are then used together in the transformation matrix A.
To estimate the orientations of the dominant gradients on the partially normalized regions (using $A_{norm}$), the gradient magnitudes and orientations are estimated for each region and a histogram of these gradients is computed. While we assume a random rotation of the region, it is necessary to compute gradients only in circular neighborhood. This can be done by weighting with a window-function (e.g. Gaussian) or with a condition like $(x-x_{center})^2+(y-y_{center})^2 < (ps/2)^2$, where $ps$ patch edge length. Otherwise the content of the corners would influence the orientation estimate undesirably. The orientations of gradients are weighted by their magnitudes. This means that a “stronger” gradient brings more to the corresponding bin of the histogram. For improved robustness, we can use linear interpolation to vote into the closest neighboring bins. At the end, the histogram is filtered with a 1D Gaussian and the maximum is found. For a more precise localization it is possible to fit a parabola into the maximum neighbourhood and to find the orientation with a precision over 360/(number of bins) degrees.
angle=dom_orientation(img)
atan2(y,x)
Now we are able to write the functions for geometric normalization of feature point neighborhoods with dominant orientations:
pts=transnorm(img,x,y,s,opt)
pts=simnorm(img,x,y,s,opt)
pts=affnorm(img,x,y,a11,a12,a21,a22,opt)
Input parameter opt is a structure containing normalization parameters:
opt.ps - size of the patch (edge length) opt.ext - size of the neighborhood in the canonical coordinate system
The output pts is a one-dimensional array of structures, containing for each point: coordinates - x,y, elements of the 2×2 partial affine transformation submatrix - a11,a12,a21,a22, and the normalized image patch (ps x ps matrix of type double) - patch. Use the hierarchy of these functions at implementation - a simpler function can be used by more complex ones. The resulting patch and the patch for dominant orientation estimation are computed using function affinetr.
x
y
a11
a12
a21
a22
patch
affinetr
Here's a pseudocode for normalization including orientation estimation:
% ... for each point ... A = create_matrix_A_without_rotation(x(id),y(id),...); % orientation estimation tmp = affinetr(img, A, opt.ps, opt.ext); angle = dom_orientation(tmp); % final A, dominant orientation to angle zero R = rotation_2x2(-angle); A(1:2,1:2) = A(1:2,1:2)*R; % create output pts(id).x=x(id); pts(id).y=y(id); pts(id).a11=A(1,1); ... pts(id).patch=affinetr(img, A, opt.ps, opt.ext);
The goal of photometric normalization is to suppress the scene changes caused by illumination changes. The easiest way of normalization is to expand the intensity channel into whole intensity range. We can also expand the intensity channel such that(after transformation) the mean intensity is at half of the intensity range and the standard deviation of intensities <latex>\sigma</latex> corresponds to it's range (or better <latex>2\times\sigma</latex>). In case we want to use all three color channels, we can normalize each channel separately.
ptsn=photonorm(pts)
mean
std
On the geometrically and photometrically invariant image patch from the last step, any low dimensional description can be computed. It is clear that the process of normalization is not perfect and therefore it is desirable to compute a description which is not very sensitive to the residual inaccuracies. The easiest description of the patch is the patch itself. Its disadvantages are high dimensionality and sensitivity to small translations and intensity changes. Therefore, we will try better descriptions too.
dct=dctdesc(img,num_coeffs)
dxdy=ghistodesc(img,num_bins)
You are supposed to upload functions transnorm.m, simnorm.m, affnorm.m, photonorm.m, dom_orientation.m, dctdesc.m together with all used non-standard functions you have created. Do not forget the dataset.
To test your code, copy desc_test.zip and unpack it to direcotry in MATLAB paths (or put it into the directory with your code) and execute. Compare your results with ours.
Before we start, we'll need a function pts=detect_and_describe(img,detpar,descpar) connecting the previously implemented functions together, such that it detects, geometrically and photometrically normalizes and describes feature points in the input image img. Detector parameters will be passed in structure detpar:
pts=detect_and_describe(img,detpar,descpar)
detpar.type='hessian'; detpar.sigma % sigma for derivatives detpar.threshold
detpar.type='harris'; detpar.sigmad % sigma for derivatives detpar.sigmai % sigma for integration detpar.threshold
detpar.type='sshessian'; detpar.threshold
detpar.type='mser'; detpar.min_margin % requested stability detpar.min_size % area of the smallest region (in pixels) detpar.max_area % area of the biggest region (in relation to image area, in range <0, 1>)
Parameters for normalization and description will be passed in structure descpar
descpar.type % type of the description ('dct' nebo 'ghisto' nebo 'sift') descpar.ps % size of patch descpar.ext % size of feature point neighborhood in canonical coordinate system
descpar.num_coeffs % number of coefficient in zig-zag order
descpar.num_bins % number of bins in one ax of histogram
The structure pts on the output is in the format of function affnorm (array x,y,a11,a12,a21,a22,patch) and photonorm (mean,std) from last labs, for each point <latex>i</latex> adds array pts(i).desc with the description as a column vector. You can find function detect_and_describe here.
affnorm
photonorm
pts(i).desc
detect_and_describe
After detection and description, which run on each image of the scene separately, we need to find correspondences between the images. The first phase is looking for the tentative correspondences (sometimes called “matches”). A tentative correspondence is a pair of features with descriptions similar in some metric. From now on, we will refer to our images of the scene as the left one and the right one. The easiest way is to establish tentative correspondences is to find all distances between the descriptions of features detected in the left and the right image. We call this table a “distance matrix”. We will compute the distance in Euclidean space. Let's try several methods for correspondence selection:
The output of finding tentative correspondences are pairs of indices of features from the left and the right image with the closest descriptions.
corrs=match(pts1, pts2, par)
pts1
pts2
par
pts1(i).desc
pts2(i).desc
par.method
par.threshold
par.method = 'mutual'; % for mutual nearest method 'stable'; % for stable pairing 'sclosest'; % for "first/second closest" method par.threshold % threshold, pairs with distance greater than this will be ignored
The tentative correspondences from the previous step usually contain many non-corresponding points, so called outliers, i.e. tentative correspondences with similar descriptions that do not correspond to the same physical object in the scene. The last phase of finding correspondences aims to get rid of the outliers. We will assume images of the same scene and search a model of the undergoing transformation. The result will be a model of the transformation and a set of the so called inliers, tentative correspondences that are consistent with the model.
The relation between the planes in the scene is discussed in the course Digital Image. In short:
The transformation between two images of a plane observed from two views (left and right image) using the perspective camera model is a linear transformation (called homography) of 3-dimensional homogeneous vectors (of corresponding points in the homogenoeous coordinates) represented by a regular $3\times 3$ matrix $\mathbf{H}$:
$$ \lambda\left[\!\begin{array}{c}x'\\y'\\1\end{array}\!\right]= \left[\begin{array}{ccc}h_{11}&h_{12}&h_{13}\\h_{21}&h_{22}&h_{23}\\h_{31}&h_{32}&h_{33}\end{array}\right] \left[\!\begin{array}{c}x\\y\\1\end{array}\!\right]% $$
For each pair of corresponding points $(x_i,y_i)^{\top} \leftrightarrow (x'_i,y'_i)^{\top}$ it holds:
$$\begin{array}{lcr} x' (h_{31}\,x + h_{32}\,y + h_{33}) - (h_{11}\,x + h_{12}\,y + h_{13}) & = & 0 \\ y' (h_{31}\,x + h_{32}\,y + h_{33}) - (h_{21}\,x + h_{22}\,y + h_{23}) & = & 0 \end{array}$$
Let us build a system of linear constraints for each pair:
$$ \underbrace{\left[\begin{array}{r@{\ }r@{\ }rr@{\ }r@{\ }rr@{\ \ }r@{\ \ }r} -x_1&-y_1&-1&0&0&0&x_1'x_1&x'_1y_1&x_1'\\0&0&0&-x_1&-y_1&-1&y_1'x_1&y'_1y_1&y_1'\\ -x_2&-y_2&-1&0&0&0&x_2'x_2&x'_2y_2&x_2'\\0&0&0&-x_2&-y_2&-1&y_2'x_2&y'_2y_2&y_2'\\ &\vdots&&&\vdots&&&\vdots&\\ -x_n&-y_n&-1&0&0&0&x_n'x_n&x'_ny_n&x_n'\\0&0&0&-x_n&-y_n&-1&y_n'x_n&y'_ny_n&y_n'\\ \end{array}\right]}_\mathbf{C} \left[\begin{array}{c}h_{11}\\h_{12}\\h_{13}\\h_{21}\\h_{22}\\h_{23}\\h_{31}\\h_{32}\\h_{33}\end{array}\right]=0 $$
This homogeneous system of equations has 9 degrees of freedom and we get one non-trivial solution as a right nullspace of the matrix <latex>\mathbf{C}</latex>. For a unique solution we need at least 8 linearly independent rows of matrix <latex>\mathbf{C}</latex>, i.e. 8 constraints from 4 corresponding points in a general position (no triplet of points can lie on a line!).
H=u2h(u)
$$ u = \left[\begin{array}{cccc} x_1 & x_2 & x_3 & x_4 \\ y_1 & y_2 & y_3 & y_4 \\ 1 & 1 & 1 & 1 \\ x_1' & x_2' & x_3' & x_4' \\ y_1' & y_2' & y_3' & y_4' \\ 1 & 1 & 1 & 1 \end{array}\right] $$
dist=hdist(H,u)
To find a correct model of the scene – a projective transformation between two planes, we need at least 4 inlier point pairs (not in a line). One way of picking such points in a robust way from all the pairs of acquired tentative correspondences is the RANSAC algorithm of M.Fischler and R.C.Bolles.
In our task, the RANSAC algorithm will have this form:
sample.m
hdist
Set the number of iterations to 1000 in the beginning and debug the RANSAC algorithm on a simple pair of images. A more sophisticated stopping criterion is used in practice. It is based on the probability estimate of finding an all-inlier sample (of size 4 in our case) under the assumptions of iteratively estimated fraction of inliers. The stopping criterium is implemented in functionnsamples.m. It computes total number of iterations required for a given confidence conf/
nsamples.m
num_samples = nsamples(number_of_inliers, number_tentative_correspondences, sample_size, conf);
nsamples
[Hbest,inl]=ransac_h(u,threshold,confidence)
To verify the tentative correspondences for other than planar scenes, we will need more general relation between two points from two cameras. The geometric relation of a pair of uncalibrated cameras is called the epipolar geometry. The epipolar geometry is discussed in the course of Geometry of Computer Vision and Graphics, for an initial information you can read the corresponding wikipedia page or listen to a dedicated song :). For our problem, it is important that the epipolar geometry is a geometric relation of corresponding points $x_L$ a $x_R$, images of a physical point $X$ in the scene observed by a pair of perspective cameras:
This relation can be written as:
$$ x^\top_L \mathbf{F}\, x_R = 0 $$
The epipolar geometry is represented by a matrix $\mathbf{F}$, called the fundamental matrix. To find a fundamental matrix we need at least 7 corresponding points. Our RANSAC implementation from the previous section has to be modified in two ways: We'll need a function u2f7.m that will replace the estimation of homography by estimation of the fundamental matrix $\mathbf{F}$ and a function fds.m that will replace the function hdist that estimates the distance of the reprojected points, here the distance of the corresponding point from an epipolar line (image of the ray through the point in the other camera). These functions have the same parameters as u2h and hdist, so you can just replace matrix $\mathbf{H}$ by matrix $\mathbf{F}$. Function u2f7 returns from one (result is a 3×3 matrix) up to three solutions (result is a 3x3xnumber_of_solutions matrix), thus it is required to modify the verification of the model to pick the best of returned solutions.
u2f7.m
u2h
u2f7
[Fbest,inl]=ransac_f(u,threshold,confidence)
fds
inl
Implement functions match.m, u2h.m, hdist.m, ransac_h.m, ransac_f.m and submit them in task 04_corr together with all non-standard functions you have created.
match.m
u2h.m
hdist.m
ransac_h.m
ransac_f.m
04_corr
Choose a combination of detector/description for each object in your dataset. Run the detectors and generate descriptions for all images using function detect_and_describe. Save the results (structure pts) into file obj%d_view%d.mat together with your configuration of the detector (detpar), normalization and description (descpar). Objects and views numerate from 0 (e.g. for object 1 image 3 save the results: structures pts, detpar and descpar into file obj1_view3.mat, for object 0 image 0 in file obj0_view0.mat, etc.).
obj%d_view%d.mat
Find tentative correspondences for each object and all pairs of views to the object. The result will be a matrix TC of cells 5×5, where cells will contain arrays corrs form function match. We will assume, that the process is symmetric (2→1 is same as 1→2) and generate only upper triangle (index of first image as row and second image as column) without diagonal. Leave other cells empty. Save matrix TC together with configuration par of the method match into file obj%d_tc.mat.
corrs
match
obj%d_tc.mat
At the end run both RANSACs on generated tentative correspondences for each object and save the results into matrices H, F, inlH and inlF of cells 5×5. Save the cell matrices H, F, inlH and inlF into file obj%d_results.mat together with parameters thresh_h, thresh_f, conf_h, conf_f of function ransac_h and ransac_f.
obj%d_results.mat
Pack the all results into an archive together with your images from task 02_data named (username)_obj(0-2)_view(0-4).jpg; and submit into task 04_results. To save some space you can remove the field patch from structure pts using function rmfield.
02_data
04_results
pts
rmfield
To test your code, download corr_test.zip and unpack it into directory in MATLAB paths (or put it into directory with your code) and execute. Compare your results with ours.