KLT tracking

Please note that tracking assignments use the numpy package. If it is not installed yet in your environment please install it (usually using pip.)

To fulfill this assignment, you need to submit these files (all packed in one .zip file) into the upload system:

klt.py - file with the following methods implemented:
- klt_compute_updates - core function of KLT tracking
- get_patches_subpix - function for extracting square patches from an image (you will employ your previously implemented function extract_affine_patches)
- compute_and_show_homography - function that will estimate homography from tracked points in the 1st and last frames of an image sequence, and transforms the bounding box of an object from from the 1st image to the last image.
tracked_points.pt - file saved by main.py, storing tracked points in the 1st and last frames
result.pdf - a figure saved by function compute_and_show_homography, showing the tracked points for the 1st and last frames, inliers, and transformed bbox.
imagefiltering.py, ransac.py, local_detector.py : these are the files which you have already implemented in the previous assignments and which are employed in this assignment.

Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root. Note that all places where your implementation is required in the file klt.py are marked by YOUR_IMPLEMENTATION_START and YOUR_IMPLEMENTATION_END blocks.

The Task

Given a sequence of frames, we will define the points to track in the first frame, and then track the points across the sequence.

The points to be tracked are detected in the first frame by a Harris detector. Their positions in the next frame are estimated by the KLT tracking algorithm. The KLT algorithm is iterative, and may not converge. If it does not converge in a given number of iterations for a point, the point is removed from further tracking. This is why the number of tracked points is decreasing in the video as the tracking progresses within the sequence.

The scene used in this example is approximately planar. This is why the points in the first and any subsequent frames are approximately related by a homography. We will robustly estimate the homography between the first and last frames (employing previously implemented function ransac_h). To demonstrate the homography relationship, we define the bounding box of the leaflet in the first frame, and transform it by the homography to the last frame:

Kanade-Lucas-Tomasi Tracking (KLT tracker)

KLT minimizes the sum of squared differences of image intensities between patches in subsequent frames. The minimum is found iteratively.

We are given a patch template $T(\mathbf{x})$ in one frame. In the next frame $I$, the patch contents moves to a new position. This is described by coordinate transformation $\mathbf{W}(\mathbf{x;p})$. We will assume that the movement is a translation, that is, $\mathbf{W}(\mathbf{x;p}) =[x+p_x;y+p_y]^T$. We want to estimate the shift $\mathbf{p} = [p_x, p_y]^T$ which minimizes $\sum_{\mathbf{x\in\text{patch}}}(I(\mathbf{x}+\mathbf{p})) - T(\mathbf{x}))^2.$ This minimization does not have a closed-form solution (because the image function $I$ is general, not e.g. linear)

Let $\mathbf{p}$ be the current estimate of the shift. We will improve this estimate by computing an update $\delta\mathbf{p}$ of it such that $\mathbf{p}+\delta\mathbf{p}$ produces lower difference of image intensities between $T$ and $I$. Let us linearize the squared term by first order Taylor expansion:

$ (1)\;\;\sum_{\mathbf{x}\in\text{patch}}(I(\mathbf{x}+\mathbf{p}) + \nabla I(\mathbf{x}+\mathbf{p})\delta \mathbf{p}- T(\mathbf{x}))^2, $

where $ \nabla I(\mathbf{x}+\mathbf{p}) = [\frac{\partial I(\mathbf{x}+\mathbf{p})}{\partial x},\frac{\partial I(\mathbf{x}+\mathbf{p})}{\partial y}] $ is gradient of $I$ evaluated at $\mathbf{x}+\mathbf{p}$.

We want to minimize (1) w.r.t. $\delta\mathbf{p}$, in the least-squares sense. We thus want to achieve the following minimization:

$(2)\;\;\sum_{\mathbf{x}\in\text{patch}}(\nabla I(\mathbf{x}+\mathbf{p})\delta\mathbf{p} - [ T(\mathbf{x}) - I(\mathbf{x}+\mathbf{p}) ] )^2 \rightarrow \min$

which can be rewritten as a standard least-square minimization problem

$(3)\;\; \mathbf{A}\mathbf{z} = \mathbf{b}$

where the matrix $\mathbf{A}$ is $N$-by-$2$ ($N$ is the number of pixels in the patch); rows contain the gradient $\nabla I(\mathbf{x}+\mathbf{p})$ for given $\mathbf{x}$. Vector $\mathbf{b}$ has length $N$ and its elements store the difference of $T$ and $I$ for given $\mathbf{x}$. The least-square solution $\mathbf{z}$ corresponds to $\delta\mathbf{p}$.

Template files, support files and suggested progress of your implementation

The template contains the following files:

klt.py : all functions for this task, including templates for your implementation (blocks requiring your work are marked by YOUR_IMPLEMENTATION_START and YOUR_IMPLEMENTATION_END blocks).
test.py : a short script which makes it easy for you to test your implementation of the KLT tracking. It loads a small image and its artificially translated version. It prints out the true translation and the translation estimated by your implementation. When correctly implemented, these differ by only several hundredths of a pixel. E.g. (for test_idx=3):

    In [1]: %run test                                                               
    True translation:      0.75, -0.20
    Estimated translation: 0.73, -0.20

main.py : a script which runs the main task in this assignment, i.e. going over the entire sequence, running the KLT algorithm on it, storing the points in the 1st and last frames, saving tracked_points.pt and also calling the function compute_and_show_homography which saves an image of the first and last frames, together with transformed bbox, to result.pdf
klt_params.py a file which defines the parameters for the task and allows you to easily define your own sets of parameters.

Suggested plan for your implementation:

implement get_patches_subpix
implement klt_compute_updates, use test.py to check that it works correctly
run main.py and have the points tracked from the first to the last frame (and saved to tracked_points.pt).
implement the required part of the compute_and_show_homography and see how well it performs. This function loads the previously saved tracked_points.pt, so that there is no need to run the whole tracking from scratch again. Calling this function several times also gives you an idea of variances in the estimated homography (we use RANSAC without local optimization, and the estimated homographies and thus the transformed bboxes are likely to be different for individual runs.)

Homography estimation

This is implemented in function compute_and_show_homography. Your task is to construct correspondences from the tracked points which are then used in your previously implemented ransac_h function.

Implementation Overview

KLT pseudocode (this is fully implemented in the template)

def track_klt(img_prev, img_next, xs, point_ids, pars): 
    # img_prev: template image
    # img_next: next image
    # xs: coordinates of points in template image
    # points_idx : identities (indices) of points in template image
    # pars
 
    # pre-compute stuff which stays constant over iterations, for saving time: 
    Ts = ... # extract all patches in the template image 
    img_next_gradient = ... # compute gradient in the next image
 
    ps = ... # set initial estimate of all shifts (usually to zero) 
 
    # iterate
    while iterations not finished:
        dps = klt_compute_updates(Ts, img_next, img_next_gradient, xs, ps, window_hsize)
        ps += dps 
 
    # return only those which have converged

klt_compute_updates (you need to implement this)

def klt_compute_updates(Ts, img_next, img_next_gradient, xs, ps, window_hsize): 
 
    # Is = ... # extract all patches, under current shift, from img_next 
    # Gx = ... # extract all patches, under current shift, from x-component of gradient of img_next 
    # Gy = ... # extract all patches, under current shift, from y-component of gradient of img_next
 
    # go over all points and compute the updates dps: 
    for all points: 
        compute shift update
        # note: np.linalg.lstsq can be useful for this
 
    return dps

References

Lucas-Kanade 20 Years On: A Unifying Framework

Table of Contents