Search
Please note that tracking assignments use the numpy package. If it is not installed yet in your environment please install it (usually using pip.)
numpy
pip
To fulfill this assignment, you need to submit these files (all packed in one .zip file) into the upload system:
.zip
klt.py
klt_compute_updates
get_patches_subpix
extract_affine_patches
compute_and_show_homography
tracked_points.pt
main.py
result.pdf
imagefiltering.py, ransac.py, local_detector.py
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root. Note that all places where your implementation is required in the file klt.py are marked by YOUR_IMPLEMENTATION_START and YOUR_IMPLEMENTATION_END blocks.
YOUR_IMPLEMENTATION_START
YOUR_IMPLEMENTATION_END
Given a sequence of frames, we will define the points to track in the first frame, and then track the points across the sequence.
The points to be tracked are detected in the first frame by a Harris detector. Their positions in the next frame are estimated by the KLT tracking algorithm. The KLT algorithm is iterative, and may not converge. If it does not converge in a given number of iterations for a point, the point is removed from further tracking. This is why the number of tracked points is decreasing in the video as the tracking progresses within the sequence.
The scene used in this example is approximately planar. This is why the points in the first and any subsequent frames are approximately related by a homography. We will robustly estimate the homography between the first and last frames (employing previously implemented function ransac_h). To demonstrate the homography relationship, we define the bounding box of the leaflet in the first frame, and transform it by the homography to the last frame:
ransac_h
KLT minimizes the sum of squared differences of image intensities between patches in subsequent frames. The minimum is found iteratively.
We are given a patch template $T(\mathbf{x})$ in one frame. In the next frame $I$, the patch contents moves to a new position. This is described by coordinate transformation $\mathbf{W}(\mathbf{x;p})$. We will assume that the movement is a translation, that is, $\mathbf{W}(\mathbf{x;p}) =[x+p_x;y+p_y]^T$. We want to estimate the shift $\mathbf{p} = [p_x, p_y]^T$ which minimizes $\sum_{\mathbf{x\in\text{patch}}}(I(\mathbf{x}+\mathbf{p})) - T(\mathbf{x}))^2.$ This minimization does not have a closed-form solution (because the image function $I$ is general, not e.g. linear)
Let $\mathbf{p}$ be the current estimate of the shift. We will improve this estimate by computing an update $\delta\mathbf{p}$ of it such that $\mathbf{p}+\delta\mathbf{p}$ produces lower difference of image intensities between $T$ and $I$. Let us linearize the squared term by first order Taylor expansion:
$ (1)\;\;\sum_{\mathbf{x}\in\text{patch}}(I(\mathbf{x}+\mathbf{p}) + \nabla I(\mathbf{x}+\mathbf{p})\delta \mathbf{p}- T(\mathbf{x}))^2, $
where $ \nabla I(\mathbf{x}+\mathbf{p}) = [\frac{\partial I(\mathbf{x}+\mathbf{p})}{\partial x},\frac{\partial I(\mathbf{x}+\mathbf{p})}{\partial y}] $ is gradient of $I$ evaluated at $\mathbf{x}+\mathbf{p}$.
We want to minimize (1) w.r.t. $\delta\mathbf{p}$, in the least-squares sense. We thus want to achieve the following minimization:
$(2)\;\;\sum_{\mathbf{x}\in\text{patch}}(\nabla I(\mathbf{x}+\mathbf{p})\delta\mathbf{p} - [ T(\mathbf{x}) - I(\mathbf{x}+\mathbf{p}) ] )^2 \rightarrow \min$
which can be rewritten as a standard least-square minimization problem
$(3)\;\; \mathbf{A}\mathbf{z} = \mathbf{b}$
where the matrix $\mathbf{A}$ is $N$-by-$2$ ($N$ is the number of pixels in the patch); rows contain the gradient $\nabla I(\mathbf{x}+\mathbf{p})$ for given $\mathbf{x}$. Vector $\mathbf{b}$ has length $N$ and its elements store the difference of $T$ and $I$ for given $\mathbf{x}$. The least-square solution $\mathbf{z}$ corresponds to $\delta\mathbf{p}$.
The template contains the following files:
test.py
In [1]: %run test True translation: 0.75, -0.20 Estimated translation: 0.73, -0.20
klt_params.py
This is implemented in function compute_and_show_homography. Your task is to construct correspondences from the tracked points which are then used in your previously implemented ransac_h function.
KLT pseudocode (this is fully implemented in the template)
def track_klt(img_prev, img_next, xs, point_ids, pars): # img_prev: template image # img_next: next image # xs: coordinates of points in template image # points_idx : identities (indices) of points in template image # pars # pre-compute stuff which stays constant over iterations, for saving time: Ts = ... # extract all patches in the template image img_next_gradient = ... # compute gradient in the next image ps = ... # set initial estimate of all shifts (usually to zero) # iterate while iterations not finished: dps = klt_compute_updates(Ts, img_next, img_next_gradient, xs, ps, window_hsize) ps += dps # return only those which have converged
klt_compute_updates (you need to implement this)
def klt_compute_updates(Ts, img_next, img_next_gradient, xs, ps, window_hsize): # Is = ... # extract all patches, under current shift, from img_next # Gx = ... # extract all patches, under current shift, from x-component of gradient of img_next # Gy = ... # extract all patches, under current shift, from y-component of gradient of img_next # go over all points and compute the updates dps: for all points: compute shift update # note: np.linalg.lstsq can be useful for this return dps
Lucas-Kanade 20 Years On: A Unifying Framework