HW4: Image segmentation

Your second homework will be Image segmentation. For this task, we have created our minified version of A2D2 dataset

The homework will be introduced in the labs in the 8th week, we will try to clear any doubts. Helpful overview of Semantic Segmentation

Dataset

The dataset consists of 3127 training images, 460 validation and 408 testing images. Each image has a resolution of 512×800 and 3 color channels (R, G, B). For each image, there is also a corresponding label image.

Classes mapping is following:

Class ID	Class name	Color
0	`Background & Buildings`	(128, 128, 128)
1	`Car`	(245, 130, 48)
2	`Humans & Bikes`	(255, 255, 25)
3	`Interest`	(240, 50, 230)
4	`Sky`	(0, 130, 200)
5	`Nature`	(60, 180, 75)

Images are downsized for better viewing

The dataset is available at taylor and cantor servers in directory /local/temporary/vir/hw04. However, beware, that the training part of the dataset is huge (several GBs). The dataset is available as either images or NumPy files. In image format, labels are specified by colors for easier viewing, in NumPy format, labels are specified by their IDs. NumPy files contain 2 NumPy files

rgbs.npy - Numpy array of dtype uint8 and shape Nx512x800x3 in range (0, 255)
labels.npy - Numpy array of dtype uint8 and shape Nx512x800 in range (0, 5)

Data across these two NumPy files are synchronized, i.e. RGB image at index i corresponds to label at index i. This index i also corresponds with filenames of individual label and RGB images, i.e. labels/label_{i:04d}.png for labels and RGB/rgb_{i:04d}.png for RGB image.

Your task

Create neural network based on UNet architecture, which uses pretrained model as encoder (VGG) and your implementation of decoder. Your implementation of decoder must be dynamic, e.i. it needs to find out how deep encored is and create decoder appropriately. Design of your decoder (exact order of layers) is up to you and it will not be checked by AE, but we recommend you to stick with idea of UNet architecture.

Your neural network will be evaluated by mean Intersection over Union (also known as Jaccard Index) on unknown test part of the dataset (which is from the same distribution as training and validation parts of the dataset). The IOU will be computed for each class separately, however, only classes relevant for autonomous driving will be taken into account when computing mIOU. The relevant classes are classes 1, 2, 3 (Car, Humans & Bikes, Interest). Encoder of your neural network will be already trained, therefore do not optimize this part of neural network. This can by set in parameters of the optimizer.

Available models:

VGG11 - (torchvision.models.vgg11_bn)
VGG13 - (torchvision.models.vgg13_bn)
VGG16 - (torchvision.models.vgg16_bn)
VGG19 - (torchvision.models.vgg19_bn)

use tips from HW3
you can implement Skip connections
classes are not equally represented in dataset, therefore it could be useful to compute theirs weights and use them in loss function
usage of augmentations can also improve performance of your model
Dynamic network implementation
- for easier dynamic implementation try Sequential to create repetitive parts of neural network
- channels in neural network do not have to always increase
- encoder does not have to end with Maxpool

Submission

Recommended method of submission:

test the correct implementation of the network in the brute
train the network and submit it to the brute

Submit a Python module/package, that is importable by name hw_4 and has a function load_model() and class UnetFromPretrained(torch.nn.Module). Function load_model() needs to return an instance of torch.nn.Module (or a subclass) which is

capable of living either on cpu or GPU. Evaluation will be run on a GPU server.
capable of accepting a tensor of size [Bx3x512x800] (B is a batch size), dtype torch.float32, ranging from 0 to 1
returning a tensor of size [Bx6x512x800] with dtype torch.float32, living on the same device as input data. This tensor will represent score for each class for each pixel.

Calling class UnetFromPretrained(encoder, num_classes) need to dynamically create neural network based on UNet architecture. This network needs to use whole encoder (including its parameters). This class is tested by 5 “random” encoders, which are based on VGG net. Maintained rules:

all convolution layers maintain height and width of input tensor
at least one convolution layer is in every repetitive block
height and width of tensor in encoder is changed only by maxpool layer

This is the only portion of your code, that will be automatically checked. However, in addition, submit also all other code that you used for the training. This is for us, to be amazed if you achieve an impossibly high score, to know how you did it

When using torch.load, always use map_location=“cpu”. It is the safest option, for the case, when your model will not be able to live on GPU.

In order to participate in the tournament part, your model also has to have non-empty docstring, briefly summarizing your model. We do not want you to spill your secrets, however, the main idea should be evident from the docstring. The only thing checked about docstring is whether it's there and nonempty. It will be visible to other students. Please, do not use diacritics in the docstring

Evaluation takes a long time. Expect the evaluation to take longer than a minute for each submission. Moreover, the submissions to this task are processed serially, so it might happen you will be waiting in a queue. However, if the evaluation is not finished within an hour from your submission, email Petr Šebek

The file size limit imposed by BRUTE for your uploads is 350 MB. It should be more than enough. This is a hard limit that cannot be bypassed! Think of it as a real-time performance constrain for your own autonomous car.

The GPU used in the evaluation is actually GPU with ID 7 on the taylor student server. Because of this, it is strictly forbidden to use this GPU during training, as it will negatively impact running time of evaluation for everyone!

Points

Points for this assignment will be spliced to implementation and performance part. For successful implementation you will get 7 points. In performance part you can get maximally 6 points.

The dataset is rather difficult, therefore in order to get any points, you only need mIOU of 50 %. In order to get a full amount of points for the individual part of the assignment, you need mIOU of 65 %. Anything in between will be linearly spaced. The maximum amount of points from the individual part of the assignment is 13. The equation is:

$$pts_{individual} = 6\times\text{clip}(\frac{mIOU - 0.5}{0.65 - 0.5}, 0, 1) + 7$$

Any submission with mIOU over 60 % is eligible for tournament part of the assignment. In the tournament part, maximum achievable points is 4 and anybody, who has a mIOU over 60 % gets some points. The precise equation for calculating the points is $$ c = \begin{cases} \text{clip}(\frac{mIOU - 0.6}{\max(mIOU) - 0.6}, 0, 1)& \text{if}\ \max(mIOU) > 0.6\\ 1 & \text{if}\ \max(mIOU) = 0.6 \land \max(mIOU) = mIOU\\ 0 & \text{otherwise} \end{cases} $$ $$pts_{tournament} = 4\times \sqrt{c(2 - c)}$$

Deadline

Individual part: 27.11.2022, 23:59 CET, where D is a day of your tutorial where the homework is introduced (in 8th week of you lab)
Tournament part: 27.11.2022 (Sunday), 23:59 CET (strict deadline)

Every 24 hours after the deadline, you will lose 1/3 point. However, you will not gain a negative number of points, so the minimum is 0.

Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments. We recommend allowing at least a full day for work on this homework.

This task is very resource-intensive. Beware, training a reasonably good network (with already fixed architecture and hyperparameters) takes at least couple hours on GPU.

Because there are two separate deadlines for this task, there are also two homeworks in BRUTE. You need to submit your work to both of them. The evaluation script is the same in both of them. However, in the tournament part, the BRUTE will always report 0 points for Automatic Evaluation and you will only gain points in the tournament. Do not be alarmed of this behavior, it is expected.

Additional training data

For the individual part, you may not use additional training data. For the tournament part, the use of additional data is allowed, however, you must follow couple rules:

The training of the decoder will be performed only by you and from scratch. That means, you may not use for that already pre-trained networks
The usage of additional training data will be mentioned in the docstring, so that it is clearly visible to everyone. (It also has to be within first 70 characters of the docstring, since we truncate longer docstrings)
The source of the additional data has to be specified in the comments in your code. This will be read only by the teachers. We want to know, what data you used to obtain such a great score, however, it is not necessary to put it in the docstring for everyone else to see.

Code templates

Shell code to find out, which user is using which gpu card. Can be used to find the user, who occupies forbidden gpu reserved for evaluation

echo [`hostname`]
GPU=0
F=$(mktemp)
 
trap "rm -f $F" 0 2 3 15
 
nvidia-smi > $F
cat $F | grep " / " | while read line; do
	echo "├─ GPU $GPU:" `echo $line | cut -d "|" -f 3`
	cat $F | grep "|    $GPU     " | while read process; do
		PID=`echo $process | cut -d " " -f 3`
		CMD=`echo $process | cut -d " " -f 5`
		MEM=`echo $process | cut -d " " -f 6`
		USER=`ps -o user $PID | awk 'NR>1'`
		printf "│  ├─%8s" $MEM
		printf "%10s" $USER
		printf "%10s  " $PID
		echo "$CMD"
	done
	((GPU+=1))
done

Code snippets to control which gpu is used for computation.

def get_device(gpu=0):   # Manually specify gpu
    if torch.cuda.is_available():
        device = torch.device(gpu)
    else:
        device='cpu'
 
    return device
 
def get_free_gpu():
    os.system('nvidia-smi -q -d Memory |grep -A5 GPU|grep Free >tmp')
    memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
    index = np.argmax(memory_available[:-1])  # Skip the 7th card --- it is reserved for evaluation!!!
 
    return index   # Returns index of the gpu with the most memory available

Downloading pretrained model (in this case VGG13)

import torchvision as tv
 
vgg13_bn = tv.models.vgg13_bn(True)

Creating, saving and loading your model

from typing import Tuple
class UnetFromPretrained(torch.nn.Module):
    '''This is my super cool, but super dumb module'''
 
    def __init__(self, encoder: nn.Module, num_classes: int):
        '''
        :param encoder: nn.Sequential, pretrained encoder
        :param num_classes: Python int, number of segmentation classes
        '''
        super(UnetFromPretrained, self).__init__()
        self.num_classes = num_classes
 
        # TODO
 
    def forward(self, x):
        shape = x.shape
 
        # TODO
 
        return torch.randn(shape[0], self.num_classes, *shape[2:], device=x.device)
 
def save_model(model, destination):
    torch.save(model.state_dict(), destination)
 
def load_model() -> Tuple[nn.Module, str]:
    '''
    :return: model: your trained NN; encoder_name: name of NN, which was used to create your NN
    '''
    vgg13_bn = tv.models.vgg13_bn(True)
    num_classes = 6
    model = UnetFromPretrained(vgg13_bn.features, num_classes)
    model.load_state_dict(torch.load(f'best_odel.pth', map_location=torch.device('cpu')))
    encoder_name = 'vgg13_bn'
    return model, encoder_name

Code along these lines is used for evaluation in BRUTE. Feel free to use it.

#!/usr/bin/env python
 
import argparse
import os
import os.path as osp
 
import numpy as np
import scipy.sparse
import torch
import torch.utils.data as tdata
from PIL import Image
 
import hw_4
 
# Constants for drawing
BORDER = 10
COLORS_CLAZZ = (
    np.array(
        (
            (128, 128, 128, 100),
            (245, 130, 48, 100),
            (255, 255, 25, 100),
            (240, 50, 230, 100),
            (0, 130, 200, 100),
            (60, 180, 75, 100),
        )
    )
    / 255
)
 
COLORS_OK = np.array(((255, 0, 0, 100), (0, 255, 0, 100))) / 255
 
# Constants about problem
CLAZZ = ['Background & Buildings', 'Car', 'Humans & Bikes', 'Interest', 'Sky', 'Nature']
WEIGHTS = np.array([0, 1, 1, 1, 0, 0])
NUM_CLAZZ = len(CLAZZ)
 
 
class Dataset(tdata.Dataset):
    def __init__(self, rgb_file, label_file):
        super().__init__()
        self.rgbs = np.load(rgb_file, mmap_mode='r')  # mmap is way faster for these large data
        self.labels = np.load(label_file, mmap_mode='r')  # mmap is way faster for these large data
 
    def __len__(self):
        return self.rgbs.shape[0]
 
    def __getitem__(self, i):
        return {
            'labels': np.asarray(self.labels[i]).astype('i8'),  # torch wants labels to be of type LongTensor, in order to compute losses
            'rgbs': np.asarray(self.rgbs[i]).astype('f4').transpose((2, 0, 1)) / 255,
            'key': i,  # for saving of the data
            # due to mmap, it is necessary to wrap your data in np.asarray. It does not add almost any overhead as it does not copy anything
        }
 
 
def blend_img(background, overlay_rgba, gamma=2.2):
    alpha = overlay_rgba[:, :, 3]
    over_corr = np.float_power(overlay_rgba[:, :, :3], gamma)
    bg_corr = np.float_power(background, gamma)
    return np.float_power(over_corr * alpha[..., None] + (1 - alpha)[..., None] * bg_corr, 1 / gamma)  # dark magic
    # partially taken from https://en.wikipedia.org/wiki/Alpha_compositing#Composing_alpha_blending_with_gamma_correction
 
 
def create_vis(rgb, label, prediction):
    if rgb.shape[0] == 3:
        rgb = rgb.transpose(1, 2, 0)
    if len(prediction.shape) == 3:
        prediction = np.argmax(prediction, 0)
 
    h, w, _ = rgb.shape
 
    gt_map = blend_img(rgb, COLORS_CLAZZ[label])  # we can index colors, wohoo!
    pred_map = blend_img(rgb, COLORS_CLAZZ[prediction])
    ok_map = blend_img(rgb, COLORS_OK[(label == prediction).astype('u1')])  # but we cannot do it by boolean, otherwise it won't work
    canvas = np.ones((h * 2 + BORDER, w * 2 + BORDER, 3))
    canvas[:h, :w] = rgb
    canvas[:h, -w:] = gt_map
    canvas[-h:, :w] = pred_map
    canvas[-h:, -w:] = ok_map
 
    canvas = (np.clip(canvas, 0, 1) * 255).astype('u1')
    return Image.fromarray(canvas)
 
 
class Metrics:
    def __init__(self, num_classes, weights=None, clazz_names=None):
        self.num_classes = num_classes
        self.cm = np.zeros((num_classes, num_classes), 'u8')  # confusion matrix
        self.tps = np.zeros(num_classes, dtype='u8')  # true positives
        self.fps = np.zeros(num_classes, dtype='u8')  # false positives
        self.fns = np.zeros(num_classes, dtype='u8')  # false negatives
        self.weights = weights if weights is not None else np.ones(num_classes)  # Weights of each class for mean IOU
        self.clazz_names = clazz_names if clazz_names is not None else np.arange(num_classes)  # for nicer printing
 
    def update(self, labels, predictions, verbose=True):
        labels = labels.cpu().numpy()
        predictions = predictions.cpu().numpy()
 
        predictions = np.argmax(predictions, 1)  # first dimension are probabilities/scores
 
        tmp_cm = scipy.sparse.coo_matrix(
            (np.ones(np.prod(labels.shape), 'u8'), (labels.flatten(), predictions.flatten())), shape=(self.num_classes, self.num_classes)
        ).toarray()  # Fastest possible way to create confusion matrix. Speed is the necessity here, even then it takes quite too much
 
        tps = np.diag(tmp_cm)
        fps = tmp_cm.sum(0) - tps
        fns = tmp_cm.sum(1) - tps
        self.cm += tmp_cm
        self.tps += tps
        self.fps += fps
        self.fns += fns
 
        precisions, recalls, ious, weights, miou = self._compute_stats(tps, fps, fns)
 
        if verbose:
            self._print_stats(tmp_cm, precisions, recalls, ious, weights, miou)
 
    def _compute_stats(self, tps, fps, fns):
        with np.errstate(all='ignore'):  # any division could be by zero, we don't really care about these errors, we know about these
            precisions = tps / (tps + fps)
            recalls = tps / (tps + fns)
            ious = tps / (tps + fps + fns)
            weights = np.copy(self.weights)
            weights[np.isnan(ious)] = 0
            miou = np.ma.average(ious, weights=weights)
        return precisions, recalls, ious, weights, miou
 
    def _print_stats(self, cm, precisions, recalls, ious, weights, miou):
        print('Confusion matrix:')
        print(cm)
        print('\n---\n')
        for c in range(self.num_classes):
            print(
                f'Class: {str(self.clazz_names[c]):20s}\t'
                f'Precision: {precisions[c]:.3f}\t'
                f'Recall {recalls[c]:.3f}\t'
                f'IOU: {ious[c]:.3f}\t'
                f'mIOU weight: {weights[c]:.1f}'
            )
        print(f'Mean IOU: {miou}')
        print('\n---\n')
 
    def print_final(self):
        precisions, recalls, ious, weights, miou = self._compute_stats(self.tps, self.fps, self.fns)
        self._print_stats(self.cm, precisions, recalls, ious, weights, miou)
 
    def reset(self):
        self.cm = np.zeros((self.num_classes, self.num_classes), 'u8')
        self.tps = np.zeros(self.num_classes, dtype='u8')
        self.fps = np.zeros(self.num_classes, dtype='u8')
        self.fns = np.zeros(self.num_classes, dtype='u8')
 
 
def evaluate(model, metrics, dataset, device, batch_size=8, verbose=True, create_imgs=False, save_dir='.'):
    model = model.eval().to(device)
    loader = tdata.DataLoader(dataset, batch_size=batch_size, shuffle=True)
 
    with torch.no_grad():  # disable gradient computation
        for i, batch in enumerate(loader):
            data = batch['rgbs'].to(device)
 
            predictions = model(data)
            metrics.update(batch['labels'], predictions, verbose)
            if create_imgs:
                for j, img_id in enumerate(batch['key']):
                    img = create_vis(data[j].cpu().numpy(), batch['labels'][j].numpy(), predictions[j].cpu().numpy())
                    os.makedirs(save_dir, exist_ok=True)
                    img.save(osp.join(save_dir, f'{img_id:04d}.png'))
            print(f'Processed {i+1:02d}th batch')
 
    metrics.print_final()
    return metrics
 
 
def prepare(args, model=None):
    dataset = Dataset(args.dataset_rgbs, args.dataset_labels)
    if model is None:
        model = hw_3.load_model()
    metrics = Metrics(NUM_CLAZZ, WEIGHTS, CLAZZ)
    return model, metrics, dataset
 
 
def run(args):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model, metrics, dataset = prepare(args)
    evaluate(model, metrics, dataset, device, args.batch_size, args.verbose, args.create_imgs, args.store_dir)
 
 
def parse_args():
    parser = argparse.ArgumentParser('Evaluation demo for HW03')
    parser.add_argument('dataset_rgbs', help='NPY file, where dataset RGB data is stored')
    parser.add_argument('dataset_labels', help='NPY file, where dataset labels are stored')
    parser.add_argument(
        '-ci', '--create_imgs', default=False, action='store_true', help='Whether to create images. Warning! It will take significantly longer!'
    )
    parser.add_argument('-sd', '--store_dir', default='.', help='Where to store images. Only valid, if create_imgs is set to True')
    parser.add_argument('-bs', '--batch_size', default=8, type=int, help='Batch size')
    parser.add_argument('-v', '--verbose', default=False, action='store_true', help='Whether to print stats of each minibatch')
 
    return parser.parse_args()
 
 
def main():
    args = parse_args()
    print(args)
    run(args)
 
 
if __name__ == '__main__':
    main()

Table of Contents