====== How to start work ======
  - Download code [[ http://cmp.felk.cvut.cz/~mishkdmy/ws_ucu_2017/video_lab_2_task.tar.gz | All-in-one-MDNet.tar.gz ]]
  - unpack to $MDNETDIR
  - ln -s /opt/cv/tracking/datasets/OTB $MDNETDIR/dataset/OTB
  - ln -s /opt/cv/tracking/datasets/vot-dataset/vot2015 $MDNETDIR/dataset/VOT/2015
  - ln -s /opt/cv/tracking/datasets/vot-dataset/vot2016 $MDNETDIR/dataset/VOT/2016
  - cd $MDNETDIR
  - run matlab
  - run setup_mdnet
  - run compile_matconvnet 

====== MDNet pretraining ======

You will implement in drop-in manner several key functions of the VOT2015 Challenge winner - MDNet. We will use [[ http://www.vlfeat.org/matconvnet/ | MatConvNet]] as deep learning library for this task. 

Here are useful links explaining MatConvNet functions:

[[http://www.vlfeat.org/matconvnet/matconvnet-manual.pdf | Manual]],  [[http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html | Classification tutorial ]],  [[http://www.robots.ox.ac.uk/~vgg/practicals/cnn-reg/index.html#fn:goingdeeper | Regression tutorial ]]

<note important>Channel ordering in MatConvNet is Height*Width*Channels*Num.</note>
 
MDNet tracker is working as following:

===== Pretraining =====

  - Load conv1 - conv3 pretrained VGG-M network
  - Randomly initialize new fc4-fc6 layers. Note, that despite their name, all new layers are actually convolutional, not fully-connected. Set new layers learning rate = 10 * lr. fc6 outputs score for image being centered and fully contained given tracking object.
  - Finetune conv1-conv3 (and learn fc4-fc6 from scratch) on subset of tracking sequences 

===== Running with online finetuning =====

  - Train new fc6 layer on initial frame.

 For each next frame:
  - Sample object locations (windows) around previous object position
  - Estimate scores, set new object position as mean of top-5 scored windows.
  - Store positive and negative samples for update.
 
  Each 10th frame:
    - Mine hard-negatives: negative samples with best detection scores
    - Fine-tune network

You will update some functions for this process. 


====== New layers initialization and learning rate manipulations  ======

Download code [[ http://cmp.felk.cvut.cz/~mishkdmy/ws_ucu_2017/video_lab_2_task.tar.gz | All-in-one-MDNet.tar.gz ]]

1. Write a function ''layers =  add_fc_layers_to_net(layers)'', which adds new layers on top of existing net.

{{:courses:ucuws17:labs:mdnet_struct.png?600|}}

Layers, which needs to be added: fc4, fc5, fc6, loss

Parameters: 
  * learning rate: 10 for weights, 20 for biases. 
  * type: conv. softmaxloss for loss. 
  * weight decay: 1 for weight, 0 for biases,
  * stride: 1
  * pad: 0
Example from  [[ http://www.vlfeat.org/matconvnet/wrappers/ | tutorial ]]:
         net.layers{1} = struct(...
     'name', 'conv1', ...
       'type', 'conv', ...
    'weights', {{randn(10,10,3,2,'single'), randn(2,1,'single')}}, ...
    'pad', 0, ...
    'stride', 1) ;
    net.layers{2} = struct(...
    'name', 'relu1', ...
    'type', 'relu') ;
    

Initialize with gaussian noise, std = 0.01.

2. Write a function ''net =  mdnet_add_domain_specific_head_with_high_lr(opts,K)'', which adds new, 2K-way fc6 layer on top of existing net and followed by softmaxloss_k loss layer.

3. Write a function  ''net = mdnet_finish_train(net)'', which which sets learning rate to (weights = 1, biases = 2) for all layers, and replaces fc6 layer with new 2-way classifier, with x10 learning rates. 


====== Implementing gradient step update ======

Write a function ''[net,res] = train_mdnet_on_batch(net,res,batch,labels,seq_id, opts)'', which does:
  - Forward-backward path by calling ''mdnet_simplenn'', result of operation is stored in ''res''
  - Updates momentum value for weights and biases using formula: ∇w =  ∇w * mom - lr ( w_decay * w  + gradient / batch_size) ;  where lr = opts.learningRate * filter_lr;  w_decay = opts.weightDecay * filter_wd; 
  - Updates filters and biases in network using formula:  w = w + ∇w;

Layers parameters are stored in ''layers'' structure. Weights gradient for ''i''-th layer is stored in ''res{i}.dzdw{1}'' for weights and ''res{i}.dzdw{2}'' for biases  

To send labels to network, use ''net.layers{end}.class = labels''. ''dydz'' is not needed. 
====== Positive and negative examples sampling ======

{{:courses:ucuws17:labs:sampling.png?800|}}

Write function '' [ bb_samples ] = gen_samples(type, bb, n, opts, max_shift, scale_f)''. Function should generate ''n'' samples, around center of bounding box ''bb''. 
Bounding box format is ''[left top width height]''

  ''type'' parameter for sampling method, is string, which could be:
    - 'gaussian' -- generate samples from a Gaussian distribution centered at bb. Will be used for positive samples, target candidates.                     
    - 'uniform'  -- generate samples from a uniform distribution around bb, Will be used for negative samples.
    - 'uniform_aspect' -- generate samples from a uniform distribution around bb with varying aspect ratios. Will be used for training samples for bbox regression.
    - 'whole' -- generate samples from the whole image. Will be used for negative samples at the initial frame.
   
   ''max_shift'' - maximum shift of generated window sample in pixels from object center
   ''scale_std'' - std of window radius scale. Final scale should be proportional to ''opts.scale_factor'' ^ ''scale_std''. 


Functions ''randsample'' and ''rand'' might be helpful for you.


====== Get target location estimate ======

{{:courses:ucuws17:labs:target_estimate.png?600|}}

Write a function  ''[targetLoc,target_score] = estimate_target_location(targetLoc, img, net_conv, net_fc,  opts, max_shift, scale_std)''. 
Bounding box format is ''[left top width height]''
  - targetLoc -- predicted bounding box. As input parameter - target location in previous frame.
  - target_score -- classifier score for predicted location
  - net_conv -- first part of the net, containing layers up to ''conv3''
  - net_fc -- fc4-fc6 layers.

Function should generate sample candicates, evaluate classifier score and output location of the best scoring window. You could also try to average top-k predictions. 

Hint: use ''mdnet_features_convX'' and ''mdnet_features_fcX'' MDNet functions for network inference.

====== Run tracking ======
Run ''demo_tracking.m'' on several sequences with provided ''mdnet_otb-vot15.mat''. Check if everything works. 


====== Run MD-Net pretraining ======
Run function ''demo_pretraining.m'' (needs GPU and several hours). Check that training error becomes lower each iteration. Store resulting model ''mdnet_otb-vot15_new.mat''


====== Compare authors model with yours  ======
Run ''demo_tracking.m'' on several sequences with ''mdnet_otb-vot15_new.mat''. Compare to provided model.