Object tracking

Video tracking is a process of object position estimation across video frames.

Simple models for tracking include:

Planar object with motion model of 2D image transformation (affine or homography).
Rigid 3D object with a motion model depending on its 3D position and orientation.
Object with feature points (e.g. Harris points) or regions (e.g. MSERs) in the image, which are tracked separately. This provides robustness to occlusions.
Non-rigid (deformable) object approximated with the surface wireframe. The motion of an object is defined by position of wireframe edges.

Popular tracking algorithms are: correlation, mean-shift, Kanade-Lucas-Tomasi (KLT) tracking, Kalman tracking, particle filtering, (sequential) linear predictors and so on. In this lab, you will implement and use the KLT tracker.

Problem introduction

Your task will be to hide a selected object in a video sequence, by tracking it and bluring it. You will select the object in the first frame and after that you will track and blur this object in the whole sequence (in a similar way as you know from a TV.)

We will assume that the whole scene is planar. The scene object is thus part of that planar scene. Therefore, the task can be acomplished by tracking points in the whole scene, estimating the motion, and then bluring the right place.

Homography estimation can be done in the same way as in second task.

Steps:

Familiarize yourself with tracking framework and testing sequence:
- xvid billa_xvid.avi
- jpeg billa_jpg.zip
- MPEG4 billa_mpeg4.avi
- WMV2 billa_wmv2.avi
Implement detection of tracking point in the whole image in function track_init_klt.m
Implement tracking of the points of interest in function track_klt.m
Try your algorithm on the testing video and plot tracked points
From tracked points estimate homographies between images
Transform the bounding box of the selected object
Blur the image inside the bounding box
Join steps 6-8 into function processmpvframe.m
Generate the video with the blurred object

Example of tracking (without blurring) export_billa_xvid.avi

Preliminaries: Reading and writing video

Download the function processMpvVideo(filename,method,options) (part of demo template), where filename is name of the videofile (e.g. example.avi), method is a tracking method and options is a structure with parameters for tracking function. The function creates a video sequence with tracked points shown and writes the output into the folder ./export/ (plus shows the preview.)

The script works well with nearly all Windows versions, but it is possible that you will encounter problems with codecs. Therefore, we have prepared the input video in several versions. On a linux system, or if any of the video version will not work, use the slightly changed function processmpvvideo_jpeg.m for reading from video decomposed to sequence of jpeg images. You can download or create these images:

Linux: mplayer -vo png video.avi
Windows: VirtualDub

Function processmpvvideo_jpeg.m creates sequence of images in folder ./export/. For joining images to video, you can use:

Linux: mencoder mf://*.png -mf w=640:h=480:fps=15:type=png -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=2400 -oac copy -o video.avi
Windows: utility MakeAvi

Tracking Framework

Download the demo template.

The function processMpvVideo calls function data = processMpvInit(img,options), which selects the object for tracking in the first image img of the sequence. options is a structure defining options (see the code). The output is a bounding box:

  data.xRect    %x coordinates of bounding box corners (anticlockwise)
  data.yRect    %y coordinates of bounding box corners (anticlockwise)

The function x = track_init_demo(img,options), returns points x for tracking.

The function is xNew = track_demo(imgPrev, imgNew, xPrev, options): imgPrev is a previous frame in the sequence, imgNew is a new image, xPrev are tracked points in the previous frame. Tracked points are represented as a structure:

  x.x    % COLUMN vector of ''x'' coordinates
  x.y    % COLUMN vector of ''y'' coordinates
  x.ID   % COLUMN vector unique identifier of the point (index to the array is not enough because some points will disappear during tracking)
  x.data % specific information for the tracking method

options are dependent on the method (see structure fields for demo). For the KLT tracker, the options will be:

 options.klt_window         % size of patch W(x,p) in the sense of getPatchSubpix.m
 options.klt_stop_count     % maximal number of iteration steps
 options.klt_stop_treshold  % minimal change epsilon^2 for termination (squared for easier computation of the distance)
 options.klt_show_steps     % 0/1 turning on and of drawing during tracking

The output xNew are the tracked points found in the new frame.

The function processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options): function is called for each image. It uses found tracked points for estimating a homography between frames. Note that processMpvVideo is run by the main script cv08.m that also sets the options.

Individual components of the task

Kanade-Lucas-Tomasi Tracking (KLT tracker)

KLT minimizes sum of squared difference of image intensities between windows in subsequent frames. The minimum is found iteratively by Newton-Raphson method.

We are given a patch template $T(\mathbf{x})$ centered at pixel $\mathbf{x} = [x,y]^T$ in image frame at time $t$. In a subsequent frame, at time $t+1$, the target moves to a new position described coordinate transformation $\mathbf{W}(\mathbf{x;p})=[x+p_x;y+p_y]^T$. The task is to estimate displacement $\mathbf{p} = [p_x, p_y]^T$.

$ (1)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p})) - T(\mathbf{x}))^2. $

Consider the best shift is $\ \Delta \mathbf{p} $, from (1) we will get

$ (2)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p}+\Delta \mathbf{p})) - T(\mathbf{x}))^2. $

We minimize this expression with respect to $\ \Delta \mathbf{p} $. Nonlinear expression (2) is linearized by (first order) Taylor expansion

$ (3)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p})) + \nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}\Delta \mathbf{p}- T(\mathbf{x}))^2, $

where $ \nabla I = [\frac{\partial I}{\partial x},\frac{\partial I}{\partial y}] $ is gradient at $\ \mathbf{W}(\mathbf{x;p}). $ Term $ \frac{\partial \mathbf{W}}{\partial \mathbf{p}} $ is Jacobian matrix of the coordinate transformation

$(4)\;\; \frac{\partial \mathbf{W}}{\partial \mathbf{p}}= \frac{ \partial[x+p_x;y+p_y]^T}{\partial [p_x,p_y]^T}= \left[\begin{array}{cc} 1 & 0
0 & 1 \end{array}\right].$

The minimum of expresion (3) over $\Delta \mathbf{p}$ is

$ (5)\;\;\Delta \mathbf{p}=H^{-1}\sum_{\mathbf{x}}[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]^T [T(\mathbf{x})-I(\mathbf{W}(\mathbf{x;p}))] $

where $ H $ is approximation of the Hessian matrix used in Gauss-Newton gradient method. This is a nonlinear regression in fact (nonlinear least squares). Note that the approximation of the Hassian matrix in this case is equal to the autocorrelation matrix (Harris matrix), i.e. a dot product of the first partial derivatives. This suggests that a good idea is to track the points in near neighborhood around Harris points

$ (6)\;\; H = \sum_{\mathbf{x}}[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]^T[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]$

Substituting (4) into (5) and (6) simplifies $ \nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}=\nabla I $. The displacement correction $\Delta \mathbf{p}$ is computed in each iteration and the estimated shift is updated by

$ (7)\;\; \mathbf{p} \leftarrow \mathbf{p} + \Delta \mathbf{p} $

The iterations are terminated by setting the maximum number of iteration steps and/or by convergence condition

$ (8)\;\; ||\Delta \mathbf{p}||\leq \epsilon. $

KLT algorithm overview

For the KLT algorithm, it is necessary to implement all operations in a sub-pixel precision.

You will need a function for a patch selection patch = getPatchSubpix(img,x,y,win_x,win_y), where the patch is a selection from image img around center x,y with size win_x*2+1 x win_y*2+1. Assume that win_x,win_y are integers, but x,y are real. Function interp2.m can be useful in your implementation. Tip: You will get much faster computation, if you crop the image before using interp2.m.

The KLT algorithm can be summarized in a few steps:

For template $ T(\mathbf{x}) $ the neighborhood of Harris point $ \mathbf{x} $ in previous image xPrev, set $ \mathbf{p} = [0 ; 0] $ and iterate:

Take patch $ I ( \mathbf{W}(\mathbf{x;p})) $ from new image imgNew the neighborhood of Harris point $ \mathbf{x} $ with current shift $ \mathbf{p} $
Estimate the error $ E = I(\mathbf{W}(\mathbf{x;p})) - T(\mathbf{x}) $.
Computes gradients $ \nabla I $ at translated coordinates $ \mathbf{W}(\mathbf{x;p}) $.
Approximate Hessian matrix $ H $ with dot product $ H = \sum_{\mathbf{x}}[\nabla I]^T[\nabla I]$.
Estimate displacement $ \Delta \mathbf{p}= H^{-1}\sum_{\mathbf{x}}[\nabla I]^T [T(\mathbf{x})-I(\mathbf{W}(\mathbf{x;p}))] $
Update the translation estimate by $ \mathbf{p} \leftarrow \mathbf{p} + \Delta \mathbf{p} $
Test the convergence $ ||\Delta \mathbf{p}||\leq \epsilon. $

Set the new position of Harris point $ \mathbf{x} \leftarrow \mathbf{x} + \mathbf{p} $

If the algorithm did not converge in the maximum number of steps, discard the point from further tracking.

A simplified illustration of the algorithm demonstrated on a car tracking is in the figure below:

Tip: Do not ignore warnings from Matlab and use operator \ rather than function inv()

To easily check your implementation, include a drawing function after each iteration (function showKltStep available here).

 if (options.klt_show_steps)
  showKltStep(step,T,I,E,Gx,Gy,aP);
 end
 % step  - serial number of iteration (zero based)
 % T     - template, (patch from imgPrev)
 % I     - current sifted patch in imgNew
 % E     - current error (I - T)
 % Gx,Gy - gradients
 % aP    - size of current shift delta P

Homography estimation

We will use our function [Hbest,inl]=ransac_h(u,threshold,confidence) from the second task. Create the vector of points u. The IDs are known and you also know that xPrev contains all points from xNew. Keep the setting in fields options.rnsc_threshold and options.rnsc_confidence.

Discard the points which are homography outliers from further tracking. Advanced algorithms have methods for adding new tracking points, however we will not implement any in this lab.

Knowing the homography (matrix H), you can transform ($x_{n} = H_{best} x_{p}$) the corners of bounding box which outlines the selected object from one frame to the next frame. Blur the interior region with function gaussfilter.m with high enough sigma. Do not blur the image outside of the selection (you can use your knowledge from this course).

Join your functions into [dataOut xNewOut] = processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options). This function returns structure dataOut containing

  dataOut.xRect    %transfomed x-coordinates of bounding box corners (anti-clockwise) 
  dataOut.yRect    %transfomed y-coordinates of bounding box corners (anti-clockwise)
  dataOut.H        %estimated homography

and xNewOut containing tracked points in the current frame without outliers. Within this function, implement also a drawing of the current image frame into a figure with a blurred selection and show the transformed bounding box.

Your task

Implement KLT tracking algorithm, estimation of Harris point translations, and test it on the familiar sequence with promotional leaflet

Implement function track_init_klt.m for Harris point detection in the image
Implement function getPatchSubpix.m for sub-pixel selection in image
Implement KLT algorithm into function track_klt.m
Try an algorithm on the sequence from the first part of this task. Transform the selection and find homographies between frames in the same way as before. Integrate the whole process into cv08.m

What you should upload?

Submit file cv08.m; in this script, set structure options and call function processMpvVideo.m.

Include also completed functions track_init_klt.m, track_klt.m and getPatchSubpix.m together with all used non-standard functions you have created. Submit the generated video file export_billa_xvid.avi with blurred selection and a highlighted bouding box and the automatically generated file homography.mat with all homographies.

Please do not submit the function procesMpvVideo.m or showKltStep.m. It interferes with the automatic evaluation.

Testing

For testing, download showkltstep.m for visualization of KLT iteration.

To test your code, you can use a matlab script and a function 'publish'. Copy klt_test.zip, unpack and add your files, which are requested for submission. Do not unpack it into your working folder, because it contains our version of processMpvVideo.m and showKltSteps.m. Compare your results with ours.

References

Lucas-Kanade 20 Years On: A Unifying Framework

Predator: A Smart Camera that Learns

Table of Contents