====== KLT tracking ====== To fulfill this assignment, you need to submit these files (all packed in one ''.zip'' file) into the [[https://cw.felk.cvut.cz/brute/|upload system]]: * **''klt.py''** - file with the following methods implemented: * **''klt_compute_updates''** - core function of KLT tracking * **''get_patches_subpix''** - function for extracting square patches from an image (you will employ your previously implemented function ''extract_affine_patches'') * **''compute_and_show_homography''** - function that will estimate homography from tracked points in the 1st and last frames of an image sequence, and transforms the bounding box of an object from from the 1st image to the last image. * **''tracked_points.pt''** - file saved by ''main.py'', storing tracked points in the 1st and last frames * **''result.pdf''** - a figure saved by function ''compute_and_show_homography'', showing the tracked points for the 1st and last frames, inliers, and transformed bbox. * **''imagefiltering.py, ransac.py, local_detector.py''** : these are the files which you have already implemented in the previous assignments and which are employed in this assignment. ** Use [[https://gitlab.fel.cvut.cz/mishkdmy/mpv-python-assignment-templates#Assignment Templates|template]] of the assignment.** When preparing a zip file for the upload system, **do not include any directories**, the files have to be in the zip file root. Note that all places where your implementation is required in the file ''klt.py'' are marked by ''YOUR_IMPLEMENTATION_START'' and ''YOUR_IMPLEMENTATION_END'' blocks. ===== The Task ===== Given a sequence of frames, we will define the points to track in the first frame, and then track the points across the sequence. {{youtube>-sKH6gzWGmo?size=640x360}} The points to be tracked are detected in the first frame by a Harris detector. Their positions in the next frame are estimated by the KLT tracking algorithm. The KLT algorithm is iterative, and may not converge. If it does not converge in a given number of iterations for a point, the point is removed from further tracking. This is why the number of tracked points is decreasing in the video as the tracking progresses within the sequence. The scene used in this example is approximately planar. This is why the points in the first and any subsequent frames are approximately related by a homography. We will robustly estimate the homography between the first and last frames (employing previously implemented function ''ransac_h''). To demonstrate the homography relationship, we define the bounding box of the leaflet in the first frame, and transform it by the homography to the last frame: {{:courses:mpv:labs:4_tracking:result.jpg?1024|}} ===== Kanade-Lucas-Tomasi Tracking (KLT tracker) ===== KLT minimizes the sum of squared differences of image intensities between patches in subsequent frames. The minimum is found iteratively. We are given a patch template $T(\mathbf{x})$ in one frame. In the next frame $I$, the patch contents moves to a new position. This is described by coordinate transformation $\mathbf{W}(\mathbf{x;p})$. We will assume that the movement is a translation, that is, $\mathbf{W}(\mathbf{x;p}) =[x+p_x;y+p_y]^T$. We want to estimate the shift $\mathbf{p} = [p_x, p_y]^T$ which minimizes $\sum_{\mathbf{x\in\text{patch}}}(I(\mathbf{x}+\mathbf{p})) - T(\mathbf{x}))^2.$ This minimization does not have a closed-form solution (because the image function $I$ is general, not e.g. linear) Let $\mathbf{p}$ be the current estimate of the shift. We will improve this estimate by computing an update $\delta\mathbf{p}$ of it such that $\mathbf{p}+\delta\mathbf{p}$ produces lower difference of image intensities between $T$ and $I$. Let us linearize the squared term by first order Taylor expansion: $ (1)\;\;\sum_{\mathbf{x}\in\text{patch}}(I(\mathbf{x}+\mathbf{p}) + \nabla I(\mathbf{x}+\mathbf{p})\delta \mathbf{p}- T(\mathbf{x}))^2, $ where $ \nabla I(\mathbf{x}+\mathbf{p}) = [\frac{\partial I(\mathbf{x}+\mathbf{p})}{\partial x},\frac{\partial I(\mathbf{x}+\mathbf{p})}{\partial y}] $ is gradient of $I$ evaluated at $\mathbf{x}+\mathbf{p}$. We want to minimize (1) w.r.t. $\delta\mathbf{p}$, in the least-squares sense. We thus want to achieve the following minimization: $(2)\;\;\sum_{\mathbf{x}\in\text{patch}}(\nabla I(\mathbf{x}+\mathbf{p})\delta\mathbf{p} - [ T(\mathbf{x}) - I(\mathbf{x}+\mathbf{p}) ] )^2 \rightarrow \min$ which can be rewritten as a standard least-square minimization problem $(3)\;\; \mathbf{A}\mathbf{z} = \mathbf{b}$ where the matrix $\mathbf{A}$ is $N$-by-$2$ ($N$ is the number of pixels in the patch); rows contain the gradient $\nabla I(\mathbf{x}+\mathbf{p})$ for given $\mathbf{x}$. Vector $\mathbf{b}$ has length $N$ and its elements store the difference of $T$ and $I$ for given $\mathbf{x}$. The least-square solution $\mathbf{z}$ corresponds to $\delta\mathbf{p}$. ===== Template files, support files and suggested progress of your implementation ===== The template contains the following files: * ''klt.py'' : all functions for this task, including templates for your implementation (blocks requiring your work are marked by YOUR_IMPLEMENTATION_START and YOUR_IMPLEMENTATION_END blocks). * ''test.py'' : a short script which makes it easy for you to test your implementation of the KLT tracking. It loads a small image and its artificially translated version. It prints out the true translation and the translation estimated by your implementation. When correctly implemented, these differ by only several hundredths of a pixel. E.g. (for test_idx=3): In [1]: %run test True translation: 0.75, -0.20 Estimated translation: 0.73, -0.20 * ''main.py'' : a script which runs the main task in this assignment, i.e. going over the entire sequence, running the KLT algorithm on it, storing the points in the 1st and last frames, saving ''tracked_points.pt'' and also calling the function ''compute_and_show_homography'' which saves an image of the first and last frames, together with transformed bbox, to ''result.pdf'' * ''klt_params.py'' a file which defines the parameters for the task and allows you to easily define your own sets of parameters. === Suggested plan for your implementation: === * implement ''get_patches_subpix'' * implement ''klt_compute_updates'', use ''test.py'' to check that it works correctly * run ''main.py'' and have the points tracked from the first to the last frame (and saved to ''tracked_points.pt''). * implement the required part of the ''compute_and_show_homography'' and see how well it performs. This function loads the previously saved ''tracked_points.pt'', so that there is no need to run the whole tracking from scratch again. Calling this function several times also gives you an idea of variances in the estimated homography (we use RANSAC without local optimization, and the estimated homographies and thus the transformed bboxes are likely to be different for individual runs.) ===== Homography estimation ===== This is implemented in function ''compute_and_show_homography''. Your task is to construct correspondences from the tracked points which are then used in your previously implemented ''ransac_h'' function. ===== Implementation Overview ===== def track_klt(img_prev, img_next, xs, point_ids, pars): # img_prev: template image # img_next: next image # xs: coordinates of points in template image # points_idx : identities (indices) of points in template image # pars # pre-compute stuff which stays constant over iterations, for saving time: Ts = ... # extract all patches in the template image img_next_gradient = ... # compute gradient in the next image ps = ... # set initial estimate of all shifts (usually to zero) # iterate while iterations not finished: dps = klt_compute_updates(Ts, img_next, img_next_gradient, xs, ps, window_hsize) ps += dps # return only those which have converged def klt_compute_updates(Ts, img_next, img_next_gradient, xs, ps, window_hsize): # Is = ... # extract all patches, under current shift, from img_next # Gx = ... # extract all patches, under current shift, from x-component of gradient of img_next # Gy = ... # extract all patches, under current shift, from y-component of gradient of img_next # go over all points and compute the updates dps: for all points: compute shift update # note: np.linalg.lstsq can be useful for this return dps ===== References ===== [[http://www.ri.cmu.edu/pub_files/pub3/baker_simon_2002_3/baker_simon_2002_3.pdf|Lucas-Kanade 20 Years On: A Unifying Framework]]