====== Object tracking ====== Video tracking is a process of object position estimation across video frames. Simple models for tracking include: * Planar object with motion model of 2D image transformation (affine or homography). * Rigid 3D object with a motion model depending on its 3D position and orientation. * Object with feature points (e.g. Harris points) or regions (e.g. MSERs) in the image, which are tracked separately. This provides robustness to occlusions. * Non-rigid (deformable) object approximated with the surface wireframe. The motion of an object is defined by position of wireframe edges. Popular tracking algorithms are: correlation, mean-shift, Kanade-Lucas-Tomasi (KLT) tracking, Kalman tracking, particle filtering, (sequential) linear predictors and so on. In this lab, you will implement and use the KLT tracker. ====== Problem introduction ====== Your task will be to hide a selected object in a video sequence, by tracking it and bluring it. You will select the object in the first frame and after that you will track and blur this object in the whole sequence (in a similar way as you know from a TV.) We will assume that the whole scene is planar. The scene object is thus part of that planar scene. Therefore, the task can be acomplished by tracking points in the whole scene, estimating the motion, and then bluring the right place. Homography estimation can be done in the same way as in [[../2_correspondence_problem/start?#the_model_of_a_planar_scene| second task]]. Steps: - Familiarize yourself with tracking framework and testing sequence: * xvid {{courses:a4m33mpv:cviceni:4_sledovani_objektu:billa_xvid.avi|}} * jpeg {{courses:a4m33mpv:cviceni:4_sledovani_objektu:billa_jpg.zip|}} * MPEG4 {{courses:a4m33mpv:cviceni:4_sledovani_objektu:billa_mpeg4.avi|}} * WMV2 {{courses:a4m33mpv:cviceni:4_sledovani_objektu:billa_wmv2.avi|}} - Implement detection of tracking point in the whole image in function ''track_init_klt.m'' - Implement tracking of the points of interest in function ''track_klt.m'' - Try your algorithm on the testing video and plot tracked points - From tracked points estimate homographies between images - Transform the bounding box of the selected object - Blur the image inside the bounding box - Join steps 6-8 into function ''processmpvframe.m'' - Generate the video with the blurred object Example of tracking (without blurring) {{courses:a4m33mpv:cviceni:4_sledovani_objektu:export_billa_xvid.avi|}} === Preliminaries: Reading and writing video === Download the function ''processMpvVideo(filename,method,options)'' (part of {{courses:a4m33mpv:cviceni:4_sledovani_objektu:cv08.zip|demo template}}), where ''filename'' is name of the videofile (e.g. ''example.avi''), ''method'' is a tracking method and ''options'' is a structure with parameters for tracking function. The function creates a video sequence with tracked points shown and writes the output into the folder ''./export/'' (plus shows the preview.) The script works well with nearly all Windows versions, but it is possible that you will encounter problems with codecs. Therefore, we have prepared the input video in several versions. On a linux system, or if any of the video version will not work, use the slightly changed function ''processmpvvideo_jpeg.m'' for reading from video decomposed to sequence of jpeg images. You can download or create these images: * Linux: ''mplayer -vo png video.avi'' * Windows: [[http://virtualdub.sourceforge.net/|VirtualDub]] Function ''processmpvvideo_jpeg.m'' creates sequence of images in folder ''./export/''. For joining images to video, you can use: * Linux: ''mencoder mf://*.png -mf w=640:h=480:fps=15:type=png -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=2400 -oac copy -o video.avi '' * Windows: utility [[http://makeavi.sourceforge.net/|MakeAvi]] ====== Tracking Framework ====== {{:courses:ae4m33mpv:labs:4_tracking:struktura_programu_en.png|}} Download the {{courses:a4m33mpv:cviceni:4_sledovani_objektu:cv08.zip|demo template}}. The function ''processMpvVideo'' calls function ''data = processMpvInit(img,options)'', which selects the object for tracking in the first image ''img'' of the sequence. ''options'' is a structure defining options (see the code). The output is a bounding box: data.xRect %x coordinates of bounding box corners (anticlockwise) data.yRect %y coordinates of bounding box corners (anticlockwise) The function ''x = track_init_demo(img,options)'', returns points ''x'' for tracking. The function is ''xNew = track_demo(imgPrev, imgNew, xPrev, options)'': ''imgPrev'' is a previous frame in the sequence, ''imgNew'' is a new image, ''xPrev'' are tracked points in the previous frame. Tracked points are represented as a structure: x.x % COLUMN vector of ''x'' coordinates x.y % COLUMN vector of ''y'' coordinates x.ID % COLUMN vector unique identifier of the point (index to the array is not enough because some points will disappear during tracking) x.data % specific information for the tracking method ''options'' are dependent on the method (see structure fields for demo). For the KLT tracker, the options will be: options.klt_window % size of patch W(x,p) in the sense of getPatchSubpix.m options.klt_stop_count % maximal number of iteration steps options.klt_stop_treshold % minimal change epsilon^2 for termination (squared for easier computation of the distance) options.klt_show_steps % 0/1 turning on and of drawing during tracking The output ''xNew'' are the tracked points found in the new frame. The function ''processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options)'': function is called for each image. It uses found tracked points for estimating a homography between frames. Note that ''processMpvVideo'' is run by the main script ''cv08.m'' that also sets the options. ===== Individual components of the task ===== === Kanade-Lucas-Tomasi Tracking (KLT tracker) === KLT minimizes sum of squared difference of image intensities between windows in subsequent frames. The minimum is found iteratively by Newton-Raphson method. We are given a patch template $T(\mathbf{x})$ centered at pixel $\mathbf{x} = [x,y]^T$ in image frame at time $t$. In a subsequent frame, at time $t+1$, the target moves to a new position described coordinate transformation $\mathbf{W}(\mathbf{x;p})=[x+p_x;y+p_y]^T$. The task is to estimate displacement $\mathbf{p} = [p_x, p_y]^T$. (1)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p})) - T(\mathbf{x}))^2. Consider the best shift is \ \Delta \mathbf{p} , from (1) we will get (2)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p}+\Delta \mathbf{p})) - T(\mathbf{x}))^2. We minimize this expression with respect to \ \Delta \mathbf{p} . Nonlinear expression (2) is linearized by (first order) Taylor expansion (3)\;\;\sum_{\mathbf{x}}(I(\mathbf{W}(\mathbf{x;p})) + \nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}\Delta \mathbf{p}- T(\mathbf{x}))^2, where \nabla I = [\frac{\partial I}{\partial x},\frac{\partial I}{\partial y}] is gradient at \ \mathbf{W}(\mathbf{x;p}). Term \frac{\partial \mathbf{W}}{\partial \mathbf{p}} is Jacobian matrix of the coordinate transformation (4)\;\; \frac{\partial \mathbf{W}}{\partial \mathbf{p}}= \frac{ \partial[x+p_x;y+p_y]^T}{\partial [p_x,p_y]^T}= \left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]. The minimum of expresion (3) over $\Delta \mathbf{p}$ is (5)\;\;\Delta \mathbf{p}=H^{-1}\sum_{\mathbf{x}}[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]^T [T(\mathbf{x})-I(\mathbf{W}(\mathbf{x;p}))] where H is approximation of the Hessian matrix used in Gauss-Newton gradient method. This is a nonlinear regression in fact (nonlinear least squares). Note that the approximation of the Hassian matrix in this case is equal to the autocorrelation matrix (Harris matrix), i.e. a dot product of the first partial derivatives. This suggests that a good idea is to track the points in near neighborhood around Harris points (6)\;\; H = \sum_{\mathbf{x}}[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}]^T[\nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}] Substituting (4) into (5) and (6) simplifies \nabla I\frac{\partial \mathbf{W}}{\partial \mathbf{p}}=\nabla I . The displacement correction $\Delta \mathbf{p}$ is computed in each iteration and the estimated shift is updated by (7)\;\; \mathbf{p} \leftarrow \mathbf{p} + \Delta \mathbf{p} The iterations are terminated by setting the maximum number of iteration steps and/or by convergence condition (8)\;\; ||\Delta \mathbf{p}||\leq \epsilon. === KLT algorithm overview === For the KLT algorithm, it is necessary to implement all operations in a sub-pixel precision. You will need a function for a patch selection ''patch = getPatchSubpix(img,x,y,win_x,win_y)'', where the ''patch'' is a selection from image ''img'' around center ''x,y'' with size ''win_x*2+1 x win_y*2+1''. Assume that ''win_x,win_y'' are integers, but ''x,y'' are real. Function ''interp2.m'' can be useful in your implementation. **Tip:** You will get much faster computation, if you crop the image before using ''interp2.m''. The KLT algorithm can be summarized in a few steps: For template T(\mathbf{x}) the neighborhood of Harris point \mathbf{x} in previous image ''xPrev'', set \mathbf{p} = [0 ; 0] and iterate: - Take patch I ( \mathbf{W}(\mathbf{x;p})) from new image ''imgNew'' the neighborhood of Harris point \mathbf{x} with current shift \mathbf{p} - Estimate the error E = I(\mathbf{W}(\mathbf{x;p})) - T(\mathbf{x}) . - Computes gradients \nabla I at translated coordinates \mathbf{W}(\mathbf{x;p}) . - Approximate Hessian matrix H with dot product H = \sum_{\mathbf{x}}[\nabla I]^T[\nabla I]. - Estimate displacement \Delta \mathbf{p}= H^{-1}\sum_{\mathbf{x}}[\nabla I]^T [T(\mathbf{x})-I(\mathbf{W}(\mathbf{x;p}))] - Update the translation estimate by \mathbf{p} \leftarrow \mathbf{p} + \Delta \mathbf{p} - Test the convergence ||\Delta \mathbf{p}||\leq \epsilon. Set the new position of Harris point \mathbf{x} \leftarrow \mathbf{x} + \mathbf{p} If the algorithm did not converge in the maximum number of steps, discard the point from further tracking. A simplified illustration of the algorithm demonstrated on a car tracking is in the figure below: \\ {{courses:a4m33mpv:cviceni:4_sledovani_objektu:example.png}} \\ **Tip:** Do not ignore warnings from Matlab and use operator ''\'' rather than function ''inv()'' To easily check your implementation, include a drawing function after each iteration (function ''showKltStep'' available {{courses:a4m33mpv:cviceni:4_sledovani_objektu:showkltstep.m|here}}). if (options.klt_show_steps) showKltStep(step,T,I,E,Gx,Gy,aP); end % step - serial number of iteration (zero based) % T - template, (patch from imgPrev) % I - current sifted patch in imgNew % E - current error (I - T) % Gx,Gy - gradients % aP - size of current shift delta P === Homography estimation === We will use our function ''[Hbest,inl]=ransac_h(u,threshold,confidence)'' from the second task. Create the vector of points ''u''. The ''ID''s are known and you also know that ''xPrev'' contains all points from ''xNew''. Keep the setting in fields ''options.rnsc_threshold'' and ''options.rnsc_confidence''. Discard the points which are homography outliers from further tracking. Advanced algorithms have methods for adding new tracking points, however we will not implement any in this lab. Knowing the homography (matrix ''H''), you can transform (x_{n} = H_{best} x_{p}) the corners of bounding box which outlines the selected object from one frame to the next frame. Blur the interior region with function ''gaussfilter.m'' with high enough sigma. Do not blur the image outside of the selection (you can use your knowledge from this [[http://cmp.felk.cvut.cz/cmp/courses/TZ/2010/HomeWorks/HW-06/|course]]). Join your functions into '' [dataOut xNewOut] = processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options)''. This function returns structure ''dataOut'' containing dataOut.xRect %transfomed x-coordinates of bounding box corners (anti-clockwise) dataOut.yRect %transfomed y-coordinates of bounding box corners (anti-clockwise) dataOut.H %estimated homography and ''xNewOut'' containing tracked points in the current frame without outliers. Within this function, implement also a drawing of the current image frame into a figure with a blurred selection and show the transformed bounding box. ===== Your task ===== Implement KLT tracking algorithm, estimation of Harris point translations, and test it on the familiar sequence with promotional leaflet - Implement function ''track_init_klt.m'' for Harris point detection in the image - Implement function ''getPatchSubpix.m'' for sub-pixel selection in image - Implement KLT algorithm into function ''track_klt.m'' - Try an algorithm on the sequence from the first part of this task. Transform the selection and find homographies between frames in the same way as before. Integrate the whole process into ''cv08.m'' ===== What you should upload? ===== Submit file ''cv08.m''; in this script, set structure ''options'' and call function ''processMpvVideo.m''. Include also completed functions ''track_init_klt.m'', ''track_klt.m'' and ''getPatchSubpix.m'' together with all used non-standard functions you have created. Submit the generated video file ''export_billa_xvid.avi'' with blurred selection and a highlighted bouding box and the automatically generated file ''homography.mat'' with all homographies. Please do **not** submit the function ''procesMpvVideo.m'' or ''showKltStep.m''. It interferes with the automatic evaluation. ===== Testing ===== For testing, download {{courses:a4m33mpv:cviceni:4_sledovani_objektu:showkltstep.m|}} for visualization of KLT iteration. To test your code, you can use a matlab script and a function 'publish'. Copy {{courses:a4m33mpv:cviceni:4_sledovani_objektu:klt_test.zip|klt_test.zip}}, unpack and add your files, which are requested for submission. Do not unpack it into your working folder, because it contains our version of ''processMpvVideo.m'' and ''showKltSteps.m''. Compare your results with [[http://cmp.felk.cvut.cz/~cechj/mpv/track-09/Vysledky%20ulohy%20%2709_klt%27%20z%20testovaciho%20skriptu.htm|ours]]. ===== References ===== [[http://www.ri.cmu.edu/pub_files/pub3/baker_simon_2002_3/baker_simon_2002_3.pdf|Lucas-Kanade 20 Years On: A Unifying Framework]] [[http://www.youtube.com/watch?v=1GhNXHCQGsM|Predator: A Smart Camera that Learns]]