Video tracking is a process of object position estimation across video frames.
Simple models for tracking include:
Popular tracking algorithms are: correlation, mean-shift, Kanade-Lucas-Tomasi (KLT) tracking, Kalman tracking, particle filtering, (sequential) linear predictors and so on. In this lab, you will implement and use the KLT tracker.
Your task will be to hide a selected object in a video sequence, by tracking it and bluring it. You will select the object in the first frame and after that you will track and blur this object in the whole sequence (in a similar way as you know from a TV.)
We will assume that the whole scene is planar. The scene object is thus part of that planar scene. Therefore, the task can be acomplished by tracking points in the whole scene, estimating the motion, and then bluring the right place.
Homography estimation can be done in the same way as in second task.
Steps:
track_init_klt.m
track_klt.m
processmpvframe.m
Example of tracking (without blurring) export_billa_xvid.avi
Download the function processMpvVideo(filename,method,options)
(part of demo template), where filename
is name of the videofile (e.g. example.avi
), method
is a tracking method and options
is a structure with parameters for tracking function. The function creates a video sequence with tracked points shown and writes the output into the folder ./export/
(plus shows the preview.)
The script works well with nearly all Windows versions, but it is possible that you will encounter problems with codecs. Therefore, we have prepared the input video in several versions. On a linux system, or if any of the video version will not work, use the slightly changed function processmpvvideo_jpeg.m
for reading from video decomposed to sequence of jpeg images. You can download or create these images:
mplayer -vo png video.avi
Function processmpvvideo_jpeg.m
creates sequence of images in folder ./export/
. For joining images to video, you can use:
mencoder mf://*.png -mf w=640:h=480:fps=15:type=png -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=2400 -oac copy -o video.avi
Download the demo template.
The function processMpvVideo
calls function data = processMpvInit(img,options)
, which selects the object for tracking in the first image img
of the sequence. options
is a structure defining options (see the code). The output is a bounding box:
data.xRect %x coordinates of bounding box corners (anticlockwise) data.yRect %y coordinates of bounding box corners (anticlockwise)
The function x = track_init_demo(img,options)
, returns points x
for tracking.
The function is xNew = track_demo(imgPrev, imgNew, xPrev, options)
: imgPrev
is a previous frame in the sequence, imgNew
is a new image, xPrev
are tracked points in the previous frame. Tracked points are represented as a structure:
x.x % COLUMN vector of ''x'' coordinates x.y % COLUMN vector of ''y'' coordinates x.ID % COLUMN vector unique identifier of the point (index to the array is not enough because some points will disappear during tracking) x.data % specific information for the tracking method
options
are dependent on the method (see structure fields for demo). For the KLT tracker, the options will be:
options.klt_window % size of patch W(x,p) in the sense of getPatchSubpix.m options.klt_stop_count % maximal number of iteration steps options.klt_stop_treshold % minimal change epsilon^2 for termination (squared for easier computation of the distance) options.klt_show_steps % 0/1 turning on and of drawing during tracking
The output xNew
are the tracked points found in the new frame.
The function processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options)
: function is called for each image. It uses found tracked points for estimating a homography between frames. Note that processMpvVideo
is run by the main script cv08.m
that also sets the options.
KLT minimizes sum of squared difference of image intensities between windows in subsequent frames. The minimum is found iteratively by Newton-Raphson method.
We are given a patch template $T(\mathbf{x})$ centered at pixel $\mathbf{x} = [x,y]^T$ in image frame at time $t$. In a subsequent frame, at time $t+1$, the target moves to a new position described coordinate transformation $\mathbf{W}(\mathbf{x;p})=[x+p_x;y+p_y]^T$. The task is to estimate displacement $\mathbf{p} = [p_x, p_y]^T$.
<latex> (1)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p})) - T(\mathbf{x}))^2. </latex>
Consider the best shift is <latex>\ \Delta \mathbf{p} </latex>, from (1) we will get
<latex> (2)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p}+\Delta \mathbf{p})) - T(\mathbf{x}))^2. </latex>
We minimize this expression with respect to <latex>\ \Delta \mathbf{p} </latex>. Nonlinear expression (2) is linearized by (first order) Taylor expansion
<latex> (3)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p})) + \nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}\Delta \mathbf{p}- T(\mathbf{x}))^2, </latex>
where <latex> \nabla I = [\frac{\partial I}{\partial x},\frac{\partial I}{\partial y}] </latex> is gradient at <latex>\ \mathbf{W}(\mathbf{x;p}). </latex> Term <latex> \frac{\partial \mathbf{W}}{\partial \mathbf{p}} </latex> is Jacobian matrix of the coordinate transformation
<latex>(4)\;\; \frac{\partial \mathbf{W}}{\partial \mathbf{p}}= \frac{ \partial[x+p_x;y+p_y]^T}{\partial [p_x,p_y]^T}=
\left[\begin{array}{cc}
1 & 0
0 & 1
\end{array}\right].</latex>
The minimum of expresion (3) over $\Delta \mathbf{p}$ is
<latex> (5)\;\;\Delta \mathbf{p}=H^{-1}\sum_{\mathbf{x}}[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]^T [T(\mathbf{x})-I(\mathbf{W}(\mathbf{x;p}))] </latex>
where <latex> H </latex> is approximation of the Hessian matrix used in Gauss-Newton gradient method. This is a nonlinear regression in fact (nonlinear least squares). Note that the approximation of the Hassian matrix in this case is equal to the autocorrelation matrix (Harris matrix), i.e. a dot product of the first partial derivatives. This suggests that a good idea is to track the points in near neighborhood around Harris points
<latex> (6)\;\; H = \sum_{\mathbf{x}}[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]^T[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]</latex>
Substituting (4) into (5) and (6) simplifies <latex> \nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}=\nabla I </latex>. The displacement correction $\Delta \mathbf{p}$ is computed in each iteration and the estimated shift is updated by
<latex> (7)\;\; \mathbf{p} \leftarrow \mathbf{p} + \Delta \mathbf{p} </latex>
The iterations are terminated by setting the maximum number of iteration steps and/or by convergence condition
<latex> (8)\;\; ||\Delta \mathbf{p}||\leq \epsilon. </latex>
For the KLT algorithm, it is necessary to implement all operations in a sub-pixel precision.
You will need a function for a patch selection patch = getPatchSubpix(img,x,y,win_x,win_y)
, where the patch
is a selection from image img
around center x,y
with size win_x*2+1 x win_y*2+1
. Assume that win_x,win_y
are integers, but x,y
are real. Function interp2.m
can be useful in your implementation. Tip: You will get much faster computation, if you crop the image before using interp2.m
.
The KLT algorithm can be summarized in a few steps:
For template <latex> T(\mathbf{x}) </latex> the neighborhood of Harris point <latex> \mathbf{x} </latex> in previous image xPrev
, set <latex> \mathbf{p} = [0 ; 0] </latex> and iterate:
imgNew
the neighborhood of Harris point <latex> \mathbf{x} </latex> with current shift <latex> \mathbf{p} </latex>
Set the new position of Harris point <latex> \mathbf{x} \leftarrow \mathbf{x} + \mathbf{p} </latex>
If the algorithm did not converge in the maximum number of steps, discard the point from further tracking.
A simplified illustration of the algorithm demonstrated on a car tracking is in the figure below:
Tip: Do not ignore warnings from Matlab and use operator \
rather than function inv()
To easily check your implementation, include a drawing function after each iteration (function showKltStep
available here).
if (options.klt_show_steps) showKltStep(step,T,I,E,Gx,Gy,aP); end % step - serial number of iteration (zero based) % T - template, (patch from imgPrev) % I - current sifted patch in imgNew % E - current error (I - T) % Gx,Gy - gradients % aP - size of current shift delta P
We will use our function [Hbest,inl]=ransac_h(u,threshold,confidence)
from the second task. Create the vector of points u
. The ID
s are known and you also know that xPrev
contains all points from xNew
. Keep the setting in fields options.rnsc_threshold
and options.rnsc_confidence
.
Discard the points which are homography outliers from further tracking. Advanced algorithms have methods for adding new tracking points, however we will not implement any in this lab.
Knowing the homography (matrix H
), you can transform (<latex>x_{n} = H_{best} x_{p}</latex>) the corners of bounding box which outlines the selected object from one frame to the next frame. Blur the interior region with function gaussfilter.m
with high enough sigma. Do not blur the image outside of the selection (you can use your knowledge from this course).
Join your functions into [dataOut xNewOut] = processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options)
. This function returns structure dataOut
containing
dataOut.xRect %transfomed x-coordinates of bounding box corners (anti-clockwise) dataOut.yRect %transfomed y-coordinates of bounding box corners (anti-clockwise) dataOut.H %estimated homographyand
xNewOut
containing tracked points in the current frame without outliers. Within this function, implement also a drawing of the current image frame into a figure with a blurred selection and show the transformed bounding box.
Implement KLT tracking algorithm, estimation of Harris point translations, and test it on the familiar sequence with promotional leaflet
track_init_klt.m
for Harris point detection in the image
getPatchSubpix.m
for sub-pixel selection in image
track_klt.m
cv08.m
Submit file cv08.m
; in this script, set structure options
and call function processMpvVideo.m
.
Include also completed functions track_init_klt.m
, track_klt.m
and getPatchSubpix.m
together with all used non-standard functions you have created. Submit the generated video file export_billa_xvid.avi
with blurred selection and a highlighted bouding box and the automatically generated file homography.mat
with all homographies.
Please do not submit the function procesMpvVideo.m
or showKltStep.m
. It interferes with the automatic evaluation.
For testing, download showkltstep.m for visualization of KLT iteration.
To test your code, you can use a matlab script and a function 'publish'. Copy klt_test.zip, unpack and add your files, which are requested for submission. Do not unpack it into your working folder, because it contains our version of processMpvVideo.m
and showKltSteps.m
. Compare your results with ours.