====== How to start work ====== - Download code [[ http://cmp.felk.cvut.cz/~mishkdmy/ws_ucu_2017/video_lab_2_task.tar.gz | All-in-one-MDNet.tar.gz ]] - unpack to $MDNETDIR - ln -s /opt/cv/tracking/datasets/OTB $MDNETDIR/dataset/OTB - ln -s /opt/cv/tracking/datasets/vot-dataset/vot2015 $MDNETDIR/dataset/VOT/2015 - ln -s /opt/cv/tracking/datasets/vot-dataset/vot2016 $MDNETDIR/dataset/VOT/2016 - cd $MDNETDIR - run matlab - run setup_mdnet - run compile_matconvnet ====== MDNet pretraining ====== You will implement in drop-in manner several key functions of the VOT2015 Challenge winner - MDNet. We will use [[ http://www.vlfeat.org/matconvnet/ | MatConvNet]] as deep learning library for this task. Here are useful links explaining MatConvNet functions: [[http://www.vlfeat.org/matconvnet/matconvnet-manual.pdf | Manual]], [[http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html | Classification tutorial ]], [[http://www.robots.ox.ac.uk/~vgg/practicals/cnn-reg/index.html#fn:goingdeeper | Regression tutorial ]] Channel ordering in MatConvNet is Height*Width*Channels*Num. MDNet tracker is working as following: ===== Pretraining ===== - Load conv1 - conv3 pretrained VGG-M network - Randomly initialize new fc4-fc6 layers. Note, that despite their name, all new layers are actually convolutional, not fully-connected. Set new layers learning rate = 10 * lr. fc6 outputs score for image being centered and fully contained given tracking object. - Finetune conv1-conv3 (and learn fc4-fc6 from scratch) on subset of tracking sequences ===== Running with online finetuning ===== - Train new fc6 layer on initial frame. For each next frame: - Sample object locations (windows) around previous object position - Estimate scores, set new object position as mean of top-5 scored windows. - Store positive and negative samples for update. Each 10th frame: - Mine hard-negatives: negative samples with best detection scores - Fine-tune network You will update some functions for this process. ====== New layers initialization and learning rate manipulations ====== Download code [[ http://cmp.felk.cvut.cz/~mishkdmy/ws_ucu_2017/video_lab_2_task.tar.gz | All-in-one-MDNet.tar.gz ]] 1. Write a function ''layers = add_fc_layers_to_net(layers)'', which adds new layers on top of existing net. {{:courses:ucuws17:labs:mdnet_struct.png?600|}} Layers, which needs to be added: fc4, fc5, fc6, loss Parameters: * learning rate: 10 for weights, 20 for biases. * type: conv. softmaxloss for loss. * weight decay: 1 for weight, 0 for biases, * stride: 1 * pad: 0 Example from [[ http://www.vlfeat.org/matconvnet/wrappers/ | tutorial ]]: net.layers{1} = struct(... 'name', 'conv1', ... 'type', 'conv', ... 'weights', {{randn(10,10,3,2,'single'), randn(2,1,'single')}}, ... 'pad', 0, ... 'stride', 1) ; net.layers{2} = struct(... 'name', 'relu1', ... 'type', 'relu') ; Initialize with gaussian noise, std = 0.01. 2. Write a function ''net = mdnet_add_domain_specific_head_with_high_lr(opts,K)'', which adds new, 2K-way fc6 layer on top of existing net and followed by softmaxloss_k loss layer. 3. Write a function ''net = mdnet_finish_train(net)'', which which sets learning rate to (weights = 1, biases = 2) for all layers, and replaces fc6 layer with new 2-way classifier, with x10 learning rates. ====== Implementing gradient step update ====== Write a function ''[net,res] = train_mdnet_on_batch(net,res,batch,labels,seq_id, opts)'', which does: - Forward-backward path by calling ''mdnet_simplenn'', result of operation is stored in ''res'' - Updates momentum value for weights and biases using formula: ∇w = ∇w * mom - lr ( w_decay * w + gradient / batch_size) ; where lr = opts.learningRate * filter_lr; w_decay = opts.weightDecay * filter_wd; - Updates filters and biases in network using formula: w = w + ∇w; Layers parameters are stored in ''layers'' structure. Weights gradient for ''i''-th layer is stored in ''res{i}.dzdw{1}'' for weights and ''res{i}.dzdw{2}'' for biases To send labels to network, use ''net.layers{end}.class = labels''. ''dydz'' is not needed. ====== Positive and negative examples sampling ====== {{:courses:ucuws17:labs:sampling.png?800|}} Write function '' [ bb_samples ] = gen_samples(type, bb, n, opts, max_shift, scale_f)''. Function should generate ''n'' samples, around center of bounding box ''bb''. Bounding box format is ''[left top width height]'' ''type'' parameter for sampling method, is string, which could be: - 'gaussian' -- generate samples from a Gaussian distribution centered at bb. Will be used for positive samples, target candidates. - 'uniform' -- generate samples from a uniform distribution around bb, Will be used for negative samples. - 'uniform_aspect' -- generate samples from a uniform distribution around bb with varying aspect ratios. Will be used for training samples for bbox regression. - 'whole' -- generate samples from the whole image. Will be used for negative samples at the initial frame. ''max_shift'' - maximum shift of generated window sample in pixels from object center ''scale_std'' - std of window radius scale. Final scale should be proportional to ''opts.scale_factor'' ^ ''scale_std''. Functions ''randsample'' and ''rand'' might be helpful for you. ====== Get target location estimate ====== {{:courses:ucuws17:labs:target_estimate.png?600|}} Write a function ''[targetLoc,target_score] = estimate_target_location(targetLoc, img, net_conv, net_fc, opts, max_shift, scale_std)''. Bounding box format is ''[left top width height]'' - targetLoc -- predicted bounding box. As input parameter - target location in previous frame. - target_score -- classifier score for predicted location - net_conv -- first part of the net, containing layers up to ''conv3'' - net_fc -- fc4-fc6 layers. Function should generate sample candicates, evaluate classifier score and output location of the best scoring window. You could also try to average top-k predictions. Hint: use ''mdnet_features_convX'' and ''mdnet_features_fcX'' MDNet functions for network inference. ====== Run tracking ====== Run ''demo_tracking.m'' on several sequences with provided ''mdnet_otb-vot15.mat''. Check if everything works. ====== Run MD-Net pretraining ====== Run function ''demo_pretraining.m'' (needs GPU and several hours). Check that training error becomes lower each iteration. Store resulting model ''mdnet_otb-vot15_new.mat'' ====== Compare authors model with yours ====== Run ''demo_tracking.m'' on several sequences with ''mdnet_otb-vot15_new.mat''. Compare to provided model.