Search
Please note that tracking assignments use the numpy package. If it is not installed yet in your environment please install it (usually using pip.)
numpy
pip
To fulfill this assignment, you need to submit these files (all packed in one .zip file) into the upload system:
.zip
kcf.py
track_kcf
kcf_translations.pt
main_kcf.py
result_kcf.pdf
result_kcf_no_adaptation.pdf
Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.
Given a sequence of frames, we will define the bounding box of a patch to be tracked in the first frame, and then track the position of the patch across the entire sequence. We will only estimate the shifts of the patch. i.e. the width, height and rotation of the bounding box are all kept constant over the entire sequence.
Note: in the responses in the top-right image, the reference position (corresponding to no shift) is marked by a red cross marker.
The shifts of the patch are estimated by the KCF tracking algorithm. The path of the tracked patch over the whole sequence is then as follows (the reference coordinates for the first frame are (0, 0)):
The basic idea of the correlation filter tracking is estimating an optimal filter such that when applied to an input image, the desired response is produced. The desired response is typically of a Gaussian shape centered at the target location, so the score decreases with the distance.
The filter is trained from translated (shifted) instances of the target patch. When testing, the response of the filter is evaluated and the maximum gives the new position of the target. The filter is trained on-line and updated successively with every frame in order the tracker adapts to moderate target changes.
Major advantage of the correlation filter tracker is the computation efficiency. The reason is that the computation can be performed efficiently in the Fourier domain. Thus the tracker runs super-realtime, several hundreds FPS.
There are both linear and non-linear (kernel) versions of the tracker that are derived from a unifying least square principle by Henriques et al. [1].
The optimal linear filter $w$ is found by solving the regularized least squares problem
\begin{equation} \min_w \Bigl( || X w - y ||^2 + \lambda ||w||^2 \Bigr), \end{equation}
where $X$ is a circulant matrix of the target image patch. The rows of $X$ store all possible cyclic image shifts, $y$ is the desired response, $\lambda$ is a weight of the regularizer.
The solution of (1) is \begin{equation} w = (X^T X + \lambda I)^{-1} X^T y \end{equation}
Since $X$ is a circulant matrix, $w$ in (2) can be computed quickly using Fourier domain operations \begin{equation} \hat w = \frac{\hat x \odot \hat y}{\hat x^* \odot \hat x + \lambda}, \end{equation} where the symbols denote: $\hat ~$ the Fourier image, $\odot$ component-wise multiplication, $^*$ complex conjugate.
Moreover, the response $r$ of the filter $w$ on any image $z$ can also be computed efficiently in the Fourier domain: \begin{equation} \hat r = \hat z^* \odot \hat w. \end{equation}
By using the "kernel-trick", i.e. mapping input data by a non-linear function $x -> \varphi(x)$, and expressing the solution as the linear combination $w = \sum_i \alpha_i \varphi(x_i)$, eq. (1) becomes \begin{equation} \min_{\alpha} \Bigl( || K \alpha - y ||^2 + \lambda \alpha^T K \alpha \Bigr), \end{equation} where the matrix $K$ is the kernel matrix with elements being dot products $k_{i,j} = \varphi(x_i)^T \varphi(x_j)$. The original problem thus becomes non-linear, a.k.a kernelized ridge regression.
The solution of problem (5) is \begin{equation} \alpha = (K + \lambda I)^{-1} y, \end{equation}
which can be computed efficiently by \begin{equation} \hat \alpha = \frac{\hat y}{\hat k^{xx} + \lambda}, \end{equation} together with the fast detection response \begin{equation} \hat r = \hat k^{xz} \odot \hat \alpha. \end{equation}
The above equation is not valid for all kernels, but it is for e.g. the RBF Gaussian kernel: $k(x, x') = \exp(-\frac{1}{\sigma^2}||x-x'||^2)$, which leads to \begin{equation} k^{xx'} = \exp\Bigl(-\frac{1}{\sigma^2}\bigl(||x||^2 + ||x'||^2 - 2{\cal{F}}^{-1}(\hat x^\ast \odot \hat x')\bigr) \Bigr). \end{equation}
Note that for a linear kernel $k(x, x') = x^T x'$, it leads to \begin{equation} k^{xx'} = {\cal{F}}^{-1}(\hat x^* \odot \hat x'). \end{equation} Then by substituting (10) into (7) and (8), the result of the linear correlation tracker (3) and (4) respectively is obtained.
Besides the efficiency, the tracker also possesses implementation simplicity.
def kcf_train(x: np.array, # training image y: np.array, # desired response pars: SimpleNamespace): k = kernel_correlation(x, x, pars) alphaf = fft2(y) / (fft2(k) + pars.lam) return alphaf def kcf_detect(alphaf: np.array, # tracker filter x: np.array, # training image z: np.array, # next image pars: SimpleNamespace): k = kernel_correlation(x, z, pars); responses = ifft2(alphaf * fft2(k)); # return the response, remove imag components which # are of order 1e-16 and are there only due to numerical # imprecision: return responses.real def kernel_correlation(x1: np.array, x2: np.array, pars: SimpleNamespace): c = ifft2( ( fft2(x1).conj() * fft2(x2) ).sum(axis=2) ) c = c.real d = (x1**2).sum() + (x2**2).sum() - 2 * c; if pars.kernel_type == 'rbf': k = np.exp( (-1 / pars.rbf_sigma**2) * d / d.size); elif pars.kernel_type == 'linear': k = c; else: raise Exception('Unknown kernel type: {}'.format(pars.kernel_type)) return k
For the first frame, the model is trained with the initial position of the target object. The patch is larger than the tracked object to provide some context and the input patch is weighted by a cosine window envelope to mitigate the boundary artifacts of the circular shifts.
For a new frame, a test image patch is set by the current bounding box position. The target is detected over the test image patch as the maximum score location and the bounding box is updated. To provide the tracker with some memory, the training patch x is updated by a convex combination of the current ($x$) and the previous ($x^{(t-1)}$) states:
x
\begin{equation} x^{(t)} = \gamma x + (1-\gamma) x^{(t-1)}, \end{equation} where parameter $\gamma \in [0,1]$ controls the tracker adaptation. Default $\gamma$ is 0.075.
Finally, new model is trained on the updated training patch.
The template contains the following files:
test_kcf.py
In [1]: %run test_kcf True translation: 7.00, -3.00 Estimated translation: 7.00, -3.00
kcf_params.py