Search
You will implement in drop-in manner several key functions of the VOT2015 Challenge winner - MDNet. We will use MatConvNet as deep learning library for this task.
Here are useful links explaining MatConvNet functions:
Manual, Classification tutorial , Regression tutorial
MDNet tracker is working as following:
For each next frame:
Each 10th frame:
You will update some functions for this process.
Download code All-in-one-MDNet.tar.gz
1. Write a function layers = add_fc_layers_to_net(layers), which adds new layers on top of existing net.
layers = add_fc_layers_to_net(layers)
Layers, which needs to be added: fc4, fc5, fc6, loss
Parameters:
Example from tutorial :
net.layers{1} = struct(... 'name', 'conv1', ... 'type', 'conv', ... 'weights', {{randn(10,10,3,2,'single'), randn(2,1,'single')}}, ... 'pad', 0, ... 'stride', 1) ; net.layers{2} = struct(... 'name', 'relu1', ... 'type', 'relu') ;
Initialize with gaussian noise, std = 0.01.
2. Write a function net = mdnet_add_domain_specific_head_with_high_lr(opts,K), which adds new, 2K-way fc6 layer on top of existing net and followed by softmaxloss_k loss layer.
net = mdnet_add_domain_specific_head_with_high_lr(opts,K)
3. Write a function net = mdnet_finish_train(net), which which sets learning rate to (weights = 1, biases = 2) for all layers, and replaces fc6 layer with new 2-way classifier, with x10 learning rates.
net = mdnet_finish_train(net)
Write a function [net,res] = train_mdnet_on_batch(net,res,batch,labels,seq_id, opts), which does:
[net,res] = train_mdnet_on_batch(net,res,batch,labels,seq_id, opts)
mdnet_simplenn
res
Layers parameters are stored in layers structure. Weights gradient for i-th layer is stored in res{i}.dzdw{1} for weights and res{i}.dzdw{2} for biases
layers
i
res{i}.dzdw{1}
res{i}.dzdw{2}
To send labels to network, use net.layers{end}.class = labels. dydz is not needed.
net.layers{end}.class = labels
dydz
Write function [ bb_samples ] = gen_samples(type, bb, n, opts, max_shift, scale_f). Function should generate n samples, around center of bounding box bb. Bounding box format is [left top width height]
[ bb_samples ] = gen_samples(type, bb, n, opts, max_shift, scale_f)
n
bb
[left top width height]
''type'' parameter for sampling method, is string, which could be: - 'gaussian' -- generate samples from a Gaussian distribution centered at bb. Will be used for positive samples, target candidates. - 'uniform' -- generate samples from a uniform distribution around bb, Will be used for negative samples. - 'uniform_aspect' -- generate samples from a uniform distribution around bb with varying aspect ratios. Will be used for training samples for bbox regression. - 'whole' -- generate samples from the whole image. Will be used for negative samples at the initial frame. ''max_shift'' - maximum shift of generated window sample in pixels from object center ''scale_std'' - std of window radius scale. Final scale should be proportional to ''opts.scale_factor'' ^ ''scale_std''.
Functions randsample and rand might be helpful for you.
randsample
rand
Write a function [targetLoc,target_score] = estimate_target_location(targetLoc, img, net_conv, net_fc, opts, max_shift, scale_std). Bounding box format is [left top width height]
[targetLoc,target_score] = estimate_target_location(targetLoc, img, net_conv, net_fc, opts, max_shift, scale_std)
conv3
Function should generate sample candicates, evaluate classifier score and output location of the best scoring window. You could also try to average top-k predictions.
Hint: use mdnet_features_convX and mdnet_features_fcX MDNet functions for network inference.
mdnet_features_convX
mdnet_features_fcX
Run demo_tracking.m on several sequences with provided mdnet_otb-vot15.mat. Check if everything works.
demo_tracking.m
mdnet_otb-vot15.mat
Run function demo_pretraining.m (needs GPU and several hours). Check that training error becomes lower each iteration. Store resulting model mdnet_otb-vot15_new.mat
demo_pretraining.m
mdnet_otb-vot15_new.mat
Run demo_tracking.m on several sequences with mdnet_otb-vot15_new.mat. Compare to provided model.