You will implement in drop-in manner several key functions of the VOT2015 Challenge winner - MDNet. We will use MatConvNet as deep learning library for this task.
Here are useful links explaining MatConvNet functions:
Manual, Classification tutorial , Regression tutorial
MDNet tracker is working as following:
For each next frame:
Each 10th frame:
You will update some functions for this process.
Download code All-in-one-MDNet.tar.gz
1. Write a function layers = add_fc_layers_to_net(layers)
, which adds new layers on top of existing net.
Layers, which needs to be added: fc4, fc5, fc6, loss
Parameters:
Example from tutorial :
net.layers{1} = struct(... 'name', 'conv1', ... 'type', 'conv', ... 'weights', {{randn(10,10,3,2,'single'), randn(2,1,'single')}}, ... 'pad', 0, ... 'stride', 1) ; net.layers{2} = struct(... 'name', 'relu1', ... 'type', 'relu') ;
Initialize with gaussian noise, std = 0.01.
2. Write a function net = mdnet_add_domain_specific_head_with_high_lr(opts,K)
, which adds new, 2K-way fc6 layer on top of existing net and followed by softmaxloss_k loss layer.
3. Write a function net = mdnet_finish_train(net)
, which which sets learning rate to (weights = 1, biases = 2) for all layers, and replaces fc6 layer with new 2-way classifier, with x10 learning rates.
Write a function [net,res] = train_mdnet_on_batch(net,res,batch,labels,seq_id, opts)
, which does:
mdnet_simplenn
, result of operation is stored in res
Layers parameters are stored in layers
structure. Weights gradient for i
-th layer is stored in res{i}.dzdw{1}
for weights and res{i}.dzdw{2}
for biases
To send labels to network, use net.layers{end}.class = labels
. dydz
is not needed.
Write function [ bb_samples ] = gen_samples(type, bb, n, opts, max_shift, scale_f)
. Function should generate n
samples, around center of bounding box bb
.
Bounding box format is [left top width height]
''type'' parameter for sampling method, is string, which could be: - 'gaussian' -- generate samples from a Gaussian distribution centered at bb. Will be used for positive samples, target candidates. - 'uniform' -- generate samples from a uniform distribution around bb, Will be used for negative samples. - 'uniform_aspect' -- generate samples from a uniform distribution around bb with varying aspect ratios. Will be used for training samples for bbox regression. - 'whole' -- generate samples from the whole image. Will be used for negative samples at the initial frame. ''max_shift'' - maximum shift of generated window sample in pixels from object center ''scale_std'' - std of window radius scale. Final scale should be proportional to ''opts.scale_factor'' ^ ''scale_std''.
Functions randsample
and rand
might be helpful for you.
Write a function [targetLoc,target_score] = estimate_target_location(targetLoc, img, net_conv, net_fc, opts, max_shift, scale_std)
.
Bounding box format is [left top width height]
conv3
Function should generate sample candicates, evaluate classifier score and output location of the best scoring window. You could also try to average top-k predictions.
Hint: use mdnet_features_convX
and mdnet_features_fcX
MDNet functions for network inference.
Run demo_tracking.m
on several sequences with provided mdnet_otb-vot15.mat
. Check if everything works.
Run function demo_pretraining.m
(needs GPU and several hours). Check that training error becomes lower each iteration. Store resulting model mdnet_otb-vot15_new.mat
Run demo_tracking.m
on several sequences with mdnet_otb-vot15_new.mat
. Compare to provided model.