Search
The lab presentation related to this homework is recorded online at this link.
Please see the video before attending the labs. The labs will be mainly for consultation.
The goal of this homework is to implement, train and evaluate the object detector. The input of this neural network is an RGB image, and the output will be the position of the survivor's torso in the image with a corresponding bounding box.
The input of the neural network is a 640x640x3 (RGB channels) image. The output of the neural network should contain a 10x10x5 tensor. It is not required to use the same architecture, but we recommend using the following architecture, which is able to cover the survivor detection problem well.
In the following table is the suggested architecture you should implement in model.py. This architecture is called YoloTiny (slightly changed number of the channels in comparison with the original one, due to classification only one type of the object)
The output of the network is a tensor size of 10x10x5. Each cell of the 10×10 grid keeps the information whether the neural network detects the object center in that cell. For each grid, there are five channels with confidence, x, y, w, h.
Confidence tells how the model believe that the object is in the corresponding cell,
x and y correspond to the shift of the object center within the cell, (x = 0, y = 0) corresponds to the top-left corner of the cell, (x = 1, y = 1) corresponds to the bottom-right corner of the cell.
w and h depict the size of the bounding box relative to the size of the image.
The loss function is an essential part of the training. Your task is to implement the loss function in the script loss.py. Loss is simplified (in comparison to the original Yolo: classification and anchors are missing). There are 3 parts of the loss:
The sum of all these losses gives us the final loss we will optimize. Although we will optimize the final loss which is the sum of all these losses, we suggest you return all these losses separately as well, outside the function, to visualize how these losses move during the training.
The final you should to implement and optimize: \begin{align*} loss = \lambda_\textbf{coord}& \sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{obj}} \left[ \left( {x_i} - {\hat{x}_i} \right)^2 + \left( {y_i} - {\hat{y}_i} \right)^2 \right] \\ + \lambda_\textbf{coord}& \sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{obj}}\left[ \left( \sqrt{w_i} - \sqrt{\hat{w}_i} \right)^2 + \left( \sqrt{h_i} - \sqrt{\hat{h}_i} \right)^2 \right] \\ + &\sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{obj}} \left( C_i - \hat{C}_i \right)^2 \\ + \lambda_\textbf{noobj}& \sum_{i = 0}^{M\times N} {{1}}_{i}^{\text{noobj}} \left( C_i - \hat{C}_i \right)^2 \end{align*} Where $N\times M$ is the output dimension. ${{1}}_{i}^{\text{obj}}$ is equal to $1$ in the case, that the object center is in this cell and $0$ otherwise. ${{1}}_{i}^{\text{noobj}}$ is equal to $0$ in the case, that the object center is in this cell and $1$ otherwise. $\lambda_\textbf{coord}$ is regularizer for coordinates loss (we suggest 5). $\lambda_\textbf{noobj}$ is regularizer for noobject loss (we suggest 0.5). $C,x,y,w,h$ are confidence, position of the object center and size of the bounding-box consecutively.
The dataset you will use contains images of size 640×640 and text files with the object specification. Each object corresponds to one line. There are five numbers for each object (c,x,y,w,h).
Please download the dataset from this link. The dataset contains the training and validation part.
The main goal of this homework is to train the network. In this part, you should already have the model and loss implemented. The next part is to train the network. To do so, you have to fill in the missing parts of the training script. There are only a few missing things:
To use the tensorboard:
To show the performance of your neural network, you will evaluate the model on the validation part of the dataset and plot the precision-recall curve for different IoU thresholds. All you need to do is to change the paths for the validation dataset and for your weights in eval.py. The script “eval.py” will save image pr_curve.png which you will upload with your codes and final weights (upload only one weights!!!). The number in the graph title is important for the points you will get.
The codes for your homework are prepared at this link, please download them and fill in the missing part of the codes.
Please submit all the codes, final weights, and the image “pr_curve.pdf” generated by script “eval.py”.
The deadline is 28.11.2021 23:59.
Submission after the deadline will be penalized by one point per day. In case of a late deadline you can't attend the competitive part of the scoring (the maximum number of points without penalization is 7.5)