HW 04 - Object Detection

The lab presentation related to this homework is recorded online at this link.

Please see the video before attending the labs. The labs will be mainly for consultation.

Due to limited computational resources on GPU servers, please start with the homework as soon as possible.

Goals

The goal of this homework is to implement, train and evaluate the object detector. The input of this neural network is an RGB image, and the output will be the position of the survivor's torso in the image with a corresponding bounding box.

Implement a neural network architecture in model.py.
Implement loss function in loss.py.
Implement training script in train.py.
Train the network to detect survivors with corresponding bounding boxes.
Evaluate the model with your weights on the validation dataset, print the precision/recall curves for IoUs 0.2, 0.3, 0.4, 0.5, 0.6, 0.7. Compute mean average precision for these thresholds.

For this homework please load following modules on the servers

PyTorch/1.9.0-fosscuda-2020b
matplotlib/3.3.3-fosscuda-2020b
tensorboardX/2.4-fosscuda-2020b-PyTorch-1.9.0

Model

The input of the neural network is a 640x640x3 (RGB channels) image. The output of the neural network should contain a 10x10x5 tensor. It is not required to use the same architecture, but we recommend using the following architecture, which is able to cover the survivor detection problem well.

In the following table is the suggested architecture you should implement in model.py. This architecture is called YoloTiny (slightly changed number of the channels in comparison with the original one, due to classification only one type of the object)

Layer #	Type	Input size	Input channels	Output channels	Kernel size	Stride	Padding	Bias	Activation
1	Conv+BN	640×640	3	16	3×3	1	1	False	LeakyReLU(0.1)
2	MaxPool	640×640	16	16	2×2	2	0	-	-
3	Conv+BN	320×320	16	32	3×3	1	1	False	LeakyReLU(0.1)
4	MaxPool	320×320	32	32	2×2	2	0	-	-
5	Conv+BN	160×160	32	64	3×3	1	1	False	LeakyReLU(0.1)
6	MaxPool	160×160	64	64	2×2	2	0	-	-
7	Conv+BN	80×80	64	128	3×3	1	1	False	LeakyReLU(0.1)
8	MaxPool	80×80	128	128	2×2	2	0	-	-
9	Conv+BN	40×40	128	256	3×3	1	1	False	LeakyReLU(0.1)
10	MaxPool	40×40	256	256	2×2	2	0	-	-
11	Conv+BN	20×20	256	512	3×3	1	1	False	LeakyReLU(0.1)
12	MaxPool	20×20	512	512	2×2	2	0	-	-
13	Conv+BN	10×10	512	1024	3×3	1	1	False	LeakyReLU(0.1)
14	Conv+BN	10×10	1024	1024	3×3	1	1	False	LeakyReLU(0.1)
15	Conv+BN	10×10	1024	5	3×3	1	1	False	LeakyReLU(0.1)
16	Flatten	10×10	5	500	-	-	-	-	-
17	Linear	-	500	1024	-	-	-	True	LeakyReLU(0.1)
18	Linear	-	1024	500	-	-	-	True	Sigmoid
19	Reshape	to shape 10x10x5

The output of the network is a tensor size of 10x10x5. Each cell of the 10×10 grid keeps the information whether the neural network detects the object center in that cell. For each grid, there are five channels with confidence, x, y, w, h.

Confidence tells how the model believe that the object is in the corresponding cell,

x and y correspond to the shift of the object center within the cell, (x = 0, y = 0) corresponds to the top-left corner of the cell, (x = 1, y = 1) corresponds to the bottom-right corner of the cell.

w and h depict the size of the bounding box relative to the size of the image.

Loss function

The loss function is an essential part of the training. Your task is to implement the loss function in the script loss.py. Loss is simplified (in comparison to the original Yolo: classification and anchors are missing). There are 3 parts of the loss:

Error for the coordinates and bounding-box size.
Error for the object detection (missing the object).
Error for the wrong object detection (estimating the wrong object / or on the wrong cell).

The sum of all these losses gives us the final loss we will optimize. Although we will optimize the final loss which is the sum of all these losses, we suggest you return all these losses separately as well, outside the function, to visualize how these losses move during the training.

The final you should to implement and optimize: \begin{align*} loss = \lambda_\textbf{coord}& \sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{obj}} \left[ \left( {x_i} - {\hat{x}_i} \right)^2 + \left( {y_i} - {\hat{y}_i} \right)^2 \right] \\ + \lambda_\textbf{coord}& \sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{obj}}\left[ \left( \sqrt{w_i} - \sqrt{\hat{w}_i} \right)^2 + \left( \sqrt{h_i} - \sqrt{\hat{h}_i} \right)^2 \right] \\ + &\sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{obj}} \left( C_i - \hat{C}_i \right)^2 \\ + \lambda_\textbf{noobj}& \sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{noobj}} \left( C_i - \hat{C}_i \right)^2 \end{align*} Where $N\times M$ is the output dimension.
${{1}}_{i}^{\text{obj}}$ is equal to $1$ in the case, that the object center is in this cell and $0$ otherwise.
${{1}}_{i}^{\text{noobj}}$ is equal to $0$ in the case, that the object center is in this cell and $1$ otherwise.
$\lambda_\textbf{coord}$ is regularizer for coordinates loss (we suggest 5).
$\lambda_\textbf{noobj}$ is regularizer for noobject loss (we suggest 0.5).
$C,x,y,w,h$ are confidence, position of the object center and size of the bounding-box consecutively.

You can use torch.nn.MSELoss(reduction=“sum”) compute mean square error.

Dataset

The dataset you will use contains images of size 640×640 and text files with the object specification. Each object corresponds to one line. There are five numbers for each object (c,x,y,w,h).

c corresponds to the class number (in your dataset is only one class, 1 corresponds to survivor)
x, y are the positions of the center of the object. The numbers are relative to the image size (between 0 and one).
w, h are the width and height of the human torso. The numbers are relative to the image size (between 0 and one).

Please download the dataset from this link. The dataset contains the training and validation part.

It is strictly prohibited to use the validation part for the training.

You can turn on horizontal flipping for training data in the training script. You can do any other data augmentation of training data if you want.

Training

The main goal of this homework is to train the network. In this part, you should already have the model and loss implemented. The next part is to train the network. To do so, you have to fill in the missing parts of the training script. There are only a few missing things:

Set the paths for training and validation data.
Set the batch size and number of epochs
Initialize optimizer with the learning_rate (for example torch.optim.Adam optimizer - you can set the weight_decay as well which helps overfitting on the training set)
Do the forward pass of the model.
Compute loss.
Do the backward pass.
Optimize (step() and zero_grad())
Print or visualize your learning losses to see training performance (we suggest using tensorboard).

TensorboardX: we suggest you use tensorboard, which helps you visualize the training process. You can easily see all the losses or the outputs of the network during the training.

To use the tensorboard:

Please load the module tensorboardX/2.4-fosscuda-2020b-PyTorch-1.9.0 on the server.
Install the tensorboard on your computer (pip install tensorboard).
Mount the server disk and call tensorboard –logdir . in the folder with the tensorboard logs.
Open the site http://localhost:6006/ in your browser

screen: To prevent termination of running scripts when you will lose connection use the screen.

Evaluation

To show the performance of your neural network, you will evaluate the model on the validation part of the dataset and plot the precision-recall curve for different IoU thresholds. All you need to do is to change the paths for the validation dataset and for your weights in eval.py. The script “eval.py” will save image pr_curve.png which you will upload with your codes and final weights (upload only one weights!!!). The number in the graph title is important for the points you will get.

It is forbidden to change anything more in eval.py script (only paths for validation data and weights).

Codes

The codes for your homework are prepared at this link, please download them and fill in the missing part of the codes.

Points

If the mAP@0.2:0.1:0.7 on validation data is lower than 0.4 you will get 4 points.
If the mAP@0.2:0.1:0.7 on validation data is bigger than 0.4 (or equal) and lower than 0.5 you will get 6 points.
All the students that will have mAP@0.2:0.1:0.7 on validation data bigger than 0.5 (or equal) will be sorted by their mAP. The points will be assigned by the position in this table. The first one will have 10 points and the last one 7.5 (the step between the students will be linear with their position).

Submission

Please submit all the codes, final weights, and the image “pr_curve.pdf” generated by script “eval.py”.

Deadline

The deadline is 28.11.2021 23:59.

Submission after the deadline will be penalized by one point per day. In case of a late deadline you can't attend the competitive part of the scoring (the maximum number of points without penalization is 7.5)

Table of Contents