KCF Tracking

To fulfill this assignment, you need to submit these files (all packed in one .zip file) into the upload system:

kcf.py - file with the following methods implemented:
- track_kcf - core function of the KCF tracking
kcf_translations.pt - file saved by main_kcf.py, storing differential shifts of the tracked patch over the entire sequence
result_kcf.pdf - a figure saved by main_kcf.py, showing the coordinates of the tracked patch over the entire sequence.
result_kcf_no_adaptation.pdf - the same figure as above, but when the tracking is run with no adaptation (parameter $\gamma=1.0$, see below)

Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.

The Task

Given a sequence of frames, we will define the bounding box of a patch to be tracked in the first frame, and then track the position of the patch across the entire sequence. We will only estimate the shifts of the patch. i.e. the width, height and rotation of the bounding box are all kept constant over the entire sequence.

Note: in the responses in the top-right image, the reference position (corresponding to no shift) is marked by a red cross marker.

The shifts of the patch are estimated by the KCF tracking algorithm. The path of the tracked patch over the whole sequence is then as follows (the reference coordinates for the first frame are (0, 0)):

Tracking with correlation filters

The basic idea of the correlation filter tracking is estimating an optimal filter such that when applied to an input image, the desired response is produced. The desired response is typically of a Gaussian shape centered at the target location, so the score decreases with the distance.

The filter is trained from translated (shifted) instances of the target patch. When testing, the response of the filter is evaluated and the maximum gives the new position of the target. The filter is trained on-line and updated successively with every frame in order the tracker adapts to moderate target changes.

Major advantage of the correlation filter tracker is the computation efficiency. The reason is that the computation can be performed efficiently in the Fourier domain. Thus the tracker runs super-realtime, several hundreds FPS.

There are both linear and non-linear (kernel) versions of the tracker that are derived from a unifying least square principle by Henriques et al. [1].

Linear Correlation Filter

The optimal linear filter $w$ is found by solving the regularized least squares problem

\begin{equation} \min_w \Bigl( || X w - y ||^2 + \lambda ||w||^2 \Bigr), \end{equation}

where $X$ is a circulant matrix of the target image patch. The rows of $X$ store all possible cyclic image shifts, $y$ is the desired response, $\lambda$ is a weight of the regularizer.

The solution of (1) is \begin{equation} w = (X^T X + \lambda I)^{-1} X^T y \end{equation}

Since $X$ is a circulant matrix, $w$ in (2) can be computed quickly using Fourier domain operations \begin{equation} \hat w = \frac{\hat x \odot \hat y}{\hat x^* \odot \hat x + \lambda}, \end{equation} where the symbols denote: $\hat ~$ the Fourier image, $\odot$ component-wise multiplication, $^*$ complex conjugate.

Moreover, the response $r$ of the filter $w$ on any image $z$ can also be computed efficiently in the Fourier domain: \begin{equation} \hat r = \hat z^* \odot \hat w. \end{equation}

Kernelized Correlation Filter

By using the "kernel-trick", i.e. mapping input data by a non-linear function $x -> \varphi(x)$, and expressing the solution as the linear combination $w = \sum_i \alpha_i \varphi(x_i)$, eq. (1) becomes \begin{equation} \min_{\alpha} \Bigl( || K \alpha - y ||^2 + \lambda \alpha^T K \alpha \Bigr), \end{equation} where the matrix $K$ is the kernel matrix with elements being dot products $k_{i,j} = \varphi(x_i)^T \varphi(x_j)$. The original problem thus becomes non-linear, a.k.a kernelized ridge regression.

The solution of problem (5) is \begin{equation} \alpha = (K + \lambda I)^{-1} y, \end{equation}

which can be computed efficiently by \begin{equation} \hat \alpha = \frac{\hat y}{\hat k^{xx} + \lambda}, \end{equation} together with the fast detection response \begin{equation} \hat r = \hat k^{xz} \odot \hat \alpha. \end{equation}

The above equation is not valid for all kernels, but it is for e.g. the RBF Gaussian kernel: $k(x, x') = \exp(-\frac{1}{\sigma^2}||x-x'||^2)$, which leads to \begin{equation} k^{xx'} = \exp\Bigl(-\frac{1}{\sigma^2}\bigl(||x||^2 + ||x'||^2 - 2{\cal{F}}^{-1}(\hat x^\ast \odot \hat x')\bigr) \Bigr). \end{equation}

Note that for a linear kernel $k(x, x') = x^T x'$, it leads to \begin{equation} k^{xx'} = {\cal{F}}^{-1}(\hat x^* \odot \hat x'). \end{equation} Then by substituting (10) into (7) and (8), the result of the linear correlation tracker (3) and (4) respectively is obtained.

KCF tracker in practice

Besides the efficiency, the tracker also possesses implementation simplicity.

def kcf_train(x: np.array, # training image
              y: np.array, # desired response
              pars: SimpleNamespace):
 
    k = kernel_correlation(x, x, pars)
    alphaf = fft2(y) / (fft2(k) + pars.lam)
    return alphaf 
 
def kcf_detect(alphaf: np.array, # tracker filter
               x: np.array, # training image
               z: np.array, # next image
               pars: SimpleNamespace):
 
    k = kernel_correlation(x, z, pars);
    responses = ifft2(alphaf * fft2(k));
    # return the response, remove imag components which 
    # are of order 1e-16 and are there only due to numerical 
    # imprecision: 
    return responses.real
 
def kernel_correlation(x1: np.array, 
                       x2: np.array, 
                       pars: SimpleNamespace): 
 
    c = ifft2( ( fft2(x1).conj() * fft2(x2) ).sum(axis=2) )
    c = c.real 
    d = (x1**2).sum() + (x2**2).sum() - 2 * c;
 
    if pars.kernel_type == 'rbf': 
        k = np.exp( (-1 / pars.rbf_sigma**2) * d / d.size);
    elif pars.kernel_type == 'linear': 
        k = c;
    else: 
        raise Exception('Unknown kernel type: {}'.format(pars.kernel_type))   
 
    return k

For the first frame, the model is trained with the initial position of the target object. The patch is larger than the tracked object to provide some context and the input patch is weighted by a cosine window envelope to mitigate the boundary artifacts of the circular shifts.

For a new frame, a test image patch is set by the current bounding box position. The target is detected over the test image patch as the maximum score location and the bounding box is updated. To provide the tracker with some memory, the training patch x is updated by a convex combination of the current ($x$) and the previous ($x^{(t-1)}$) states:

\begin{equation} x^{(t)} = \gamma x + (1-\gamma) x^{(t-1)}, \end{equation} where parameter $\gamma \in [0,1]$ controls the tracker adaptation. Default $\gamma$ is 0.075.

Finally, new model is trained on the updated training patch.

Template files, support files and suggested progress of your implementation

The template contains the following files:

kcf.py : all functions for this task, including templates for your implementation. The only function you need to implement is track_kcf.
test_kcf.py : a short script which makes it easy for you to test your implementation of the KCF tracking. It loads a small image and its artificially translated version. It prints out the true translation and the translation estimated by your implementation. When correctly implemented, these should match exactly, as the shift estimates are integer numbers (for test_idx=3):

    In [1]: %run test_kcf 
    True translation:      7.00, -3.00
    Estimated translation: 7.00, -3.00

main_kcf.py : a script which runs the main task in this assignment, i.e. going over the entire sequence, running the KCF algorithm on it, storing the shifts of the patch over the entire sequence, saving them to kcf_translations.pt and also saving an image of the path of the tracked patch to result_kcf.pdf
kcf_params.py a file which defines the parameters for the task and allows you to easily define your own sets of parameters.

Suggested plan for your implementation:

implement track_kcf
use test_kcf.py to check that it works correctly
run main_kcf.py and have the patch tracked over the whole sequence (plus the translations saved to kcf_translations.pt and a visualization of the translations saved to result_kcf.pdf).
change the parameter $\gamma$ such that there is no adaptation of the training patch. Then re-run main_kcf (set the filename for saving translations to other name and the filename for pdf export to result_kcf_no_adaptation.pdf)

References

Henriques J. F., Caseirio R., Martins P., Batista J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans on PAMI 37(3):583-596. 2015

Table of Contents