====== HW 02 - Image segmentation ====== Your second homework will be Image segmentation. For this task, we have created our minified version of [[https://www.audi-electronics-venture.de/aev/web/en/driving-dataset.html|A2D2 dataset]] The homework will be introduced in the labs in the 5th week, we will try to clear any doubts ==== Dataset ==== The dataset consists of 3127 training images, 460 testing and 408 validation images. Each image has a resolution of ''512x800'' and 3 color channels (R, G, B). For each image, there is also a corresponding label image. Classes mapping is following: ^ Class ID ^ Class name ^ Color ^ | 0 | ''Background & Buildings'' | (128, 128, 128) | | 1 | ''Car'' | (245, 130, 48) | | 2 | ''Humans & Bikes'' | (255, 255, 25) | | 3 | ''Interest'' | (240, 50, 230) | | 4 | ''Sky'' | (0, 130, 200) | | 5 | ''Nature'' | (60, 180, 75) | Images are downsized for better viewing {{:courses:b3b33vir:hw:3115.png?480|}} {{:courses:b3b33vir:hw:3115.lab.png?480|}} The dataset is available at ''taylor'' and ''cantor'' [[https://cyber.felk.cvut.cz/cs/study/gpu-servers/|servers]] in directory ''/local/temporary/vir/hw02''. It can also be downloaded [[http://cmp.felk.cvut.cz/~jasekota/vir/hw02 | here]]. However, beware, that the training part of the dataset is huge (several GBs). The dataset is available as either images or NumPy files. In image format, labels are specified by //colors// for easier viewing, in NumPy format, labels are specified by their //IDs//. NumPy files contain 3 NumPy files * ''rgbs.npy'' - Numpy array of dtype ''uint8'' and shape ''Nx512x800x3'' in range ''(0, 255)'' * ''labels.npy'' - Numpy array of dtype ''uint8'' and shape ''Nx512x800'' in range ''(0, 5)'' * ''filenames.npy'' - Numpy array of dtype ''When using ''torch.load'', always use ''map_location="cpu"''. It is the safest option, for the case, when your model will not be able to live on GPU. In order to participate in the tournament part, your model also has to have **non-empty docstring**, briefly summarizing your model. We do not want you to spill your secrets, however, the main idea should be evident from the docstring. The only thing checked about docstring is whether it's there and nonempty. It will be visible to other students. **Please, do not use diacritics in the docstring** Evaluation takes a long time. Expect the evaluation to take longer than a minute for each submission. Moreover, the submissions to this task are processed serially, so it might happen you will be waiting in a queue. However, if the evaluation is not finished within an hour from your submission, email [[mailto:jasekota@fel.cvut.cz|Otakar JaĊĦek]] The file size limit imposed by BRUTE for your uploads is 350 MB. It should be more than enough. This is a hard limit that cannot be bypassed! The GPU used in the evaluation is actually GPU with ID 7 on the ''taylor'' student server. Because of this, it is **strictly** forbidden to use this GPU during training, as it will negatively impact running time of evaluation for everyone! ==== Points ==== The dataset is rather difficult, therefore in order to get any points, you only need mIOU of 50 %. In order to get a full amount of points for the individual part of the assignment, you need mIOU of 60 %. Anything in between will be linearly spaced. The maximum amount of points from the individual part of the assignment is 8. The equation is: $$pts_{individual} = 8\times\text{clip}(\frac{mIOU - 0.5}{0.6 - 0.5}, 0, 1)$$ Any submission with mIOU over 60 % is eligible for tournament part of the assignment. In the tournament part, maximum achievable points is 4 and anybody, who has a mIOU over 60 % gets //some// points. The precise equation for calculating the points is $$ c = \begin{cases} \text{clip}(\frac{mIOU - 0.6}{\max(mIOU) - 0.6}, 0, 1)& \text{if}\ \max(mIOU) > 0.6\\ 1 & \text{if}\ \max(mIOU) = 0.6 \land \max(mIOU) = mIOU\\ 0 & \text{otherwise} \end{cases} $$ $$pts_{tournament} = 4\times \sqrt{c(2 - c)}$$ ==== Deadline ==== * Individual part: D + 21 days, 23:59 CET, where D is a day of your tutorial where the homework is introduced (in 5th week) * Tournament part: 15.11.2019, 23:59 CET Every 24 hours after the deadline, you will lose 1 point. However, you will not gain a negative number of points, so the minimum is 0. Take into account, that training a neural network takes some non-trivial time. Do not start working on the homework at the last moments. We recommend allowing at least a full day for work on this homework. This task is even more resource-intensive than HW01. Beware, training a reasonably good network (with already fixed architecture and hyperparameters) takes at least couple hours. Because there are two separate deadlines for this task, there are also two homeworks in BRUTE. You need to submit your work to **both** of them. The evaluation script is the same in both of them. However, in the tournament part, the BRUTE will always report 0 points for Automatic Evaluation and you will only gain points in the tournament. Do **not** be alarmed of this behavior, it is expected. ==== Additional training data ==== For the individual part, you **may not** use additional training data. For the tournament part, the use of additional data is allowed, however, you must follow couple rules: * The training of the network will be performed **only by you and from scratch**. That means, you **may not** use already pre-trained networks * The usage of additional training data will be mentioned in the docstring, so that it is clearly visible to everyone. (It also has to be within first 70 characters of the docstring, since we truncate longer docstrings) * The source of the additional data has to be specified in the comments in your code. This will be read only by the teachers. We want to know, what data you used to obtain such a great score, however, it is not necessary to put it in the docstring for everyone else to see. ==== Code templates ==== Simplest submitted code (that won't achieve any points) can be along these lines. We strongly recommend submitting substantially more work import torch class Model(torch.nn.Module): '''This is my super cool, but super dumb module''' def __init__(self, num_classes=6): super().__init__() self.num_classes = num_classes def forward(self, x): shape = x.shape return torch.randn(shape[0], self.num_classes, *shape[2:], device=x.device) def load_model(): return Model() ---- Code along these lines is used for evaluation in BRUTE. Feel free to use it. #!/usr/bin/env python import argparse import os import os.path as osp import numpy as np import scipy.sparse import torch import torch.utils.data as tdata from PIL import Image import hw_2 # Constants for drawing BORDER = 10 COLORS_CLAZZ = ( np.array( ( (128, 128, 128, 100), (245, 130, 48, 100), (255, 255, 25, 100), (240, 50, 230, 100), (0, 130, 200, 100), (60, 180, 75, 100), ) ) / 255 ) COLORS_OK = np.array(((255, 0, 0, 100), (0, 255, 0, 100))) / 255 # Constants about problem CLAZZ = ['Background & Buildings', 'Car', 'Humans & Bikes', 'Interest', 'Sky', 'Nature'] WEIGHTS = np.array([0, 1, 1, 1, 0, 0]) NUM_CLAZZ = len(CLAZZ) class Dataset(tdata.Dataset): def __init__(self, rgb_file, label_file): super().__init__() self.rgbs = np.load(rgb_file, mmap_mode='r') # mmap is way faster for these large data self.labels = np.load(label_file, mmap_mode='r') # mmap is way faster for these large data def __len__(self): return self.rgbs.shape[0] def __getitem__(self, i): return { 'labels': np.asarray(self.labels[i]).astype('i8'), # torch wants labels to be of type LongTensor, in order to compute losses 'rgbs': np.asarray(self.rgbs[i]).astype('f4').transpose((2, 0, 1)) / 255, 'key': i, # for saving of the data # due to mmap, it is necessary to wrap your data in np.asarray. It does not add almost any overhead as it does not copy anything } def blend_img(background, overlay_rgba, gamma=2.2): alpha = overlay_rgba[:, :, 3] over_corr = np.float_power(overlay_rgba[:, :, :3], gamma) bg_corr = np.float_power(background, gamma) return np.float_power(over_corr * alpha[..., None] + (1 - alpha)[..., None] * bg_corr, 1 / gamma) # dark magic # partially taken from https://en.wikipedia.org/wiki/Alpha_compositing#Composing_alpha_blending_with_gamma_correction def create_vis(rgb, label, prediction): if rgb.shape[0] == 3: rgb = rgb.transpose(1, 2, 0) if len(prediction.shape) == 3: prediction = np.argmax(prediction, 0) h, w, _ = rgb.shape gt_map = blend_img(rgb, COLORS_CLAZZ[label]) # we can index colors, wohoo! pred_map = blend_img(rgb, COLORS_CLAZZ[prediction]) ok_map = blend_img(rgb, COLORS_OK[(label == prediction).astype('u1')]) # but we cannot do it by boolean, otherwise it won't work canvas = np.ones((h * 2 + BORDER, w * 2 + BORDER, 3)) canvas[:h, :w] = rgb canvas[:h, -w:] = gt_map canvas[-h:, :w] = pred_map canvas[-h:, -w:] = ok_map canvas = (np.clip(canvas, 0, 1) * 255).astype('u1') return Image.fromarray(canvas) class Metrics: def __init__(self, num_classes, weights=None, clazz_names=None): self.num_classes = num_classes self.cm = np.zeros((num_classes, num_classes), 'u8') # confusion matrix self.tps = np.zeros(num_classes, dtype='u8') # true positives self.fps = np.zeros(num_classes, dtype='u8') # false positives self.fns = np.zeros(num_classes, dtype='u8') # false negatives self.weights = weights if weights is not None else np.ones(num_classes) # Weights of each class for mean IOU self.clazz_names = clazz_names if clazz_names is not None else np.arange(num_classes) # for nicer printing def update(self, labels, predictions, verbose=True): labels = labels.cpu().numpy() predictions = predictions.cpu().numpy() predictions = np.argmax(predictions, 1) # first dimension are probabilities/scores tmp_cm = scipy.sparse.coo_matrix( (np.ones(np.prod(labels.shape), 'u8'), (labels.flatten(), predictions.flatten())), shape=(self.num_classes, self.num_classes) ).toarray() # Fastest possible way to create confusion matrix. Speed is the necessity here, even then it takes quite too much tps = np.diag(tmp_cm) fps = tmp_cm.sum(1) - tps fns = tmp_cm.sum(0) - tps self.cm += tmp_cm self.tps += tps self.fps += fps self.fns += fns precisions, recalls, ious, weights, miou = self._compute_stats(tps, fps, fns) if verbose: self._print_stats(tmp_cm, precisions, recalls, ious, weights, miou) def _compute_stats(self, tps, fps, fns): with np.errstate(all='ignore'): # any division could be by zero, we don't really care about these errors, we know about these precisions = tps / (tps + fps) recalls = tps / (tps + fns) ious = tps / (tps + fps + fns) weights = np.copy(self.weights) weights[np.isnan(ious)] = 0 miou = np.ma.average(ious, weights=weights) return precisions, recalls, ious, weights, miou def _print_stats(self, cm, precisions, recalls, ious, weights, miou): print('Confusion matrix:') print(cm) print('\n---\n') for c in range(self.num_classes): print( f'Class: {str(self.clazz_names[c]):20s}\t' f'Precision: {precisions[c]:.3f}\t' f'Recall {recalls[c]:.3f}\t' f'IOU: {ious[c]:.3f}\t' f'mIOU weight: {weights[c]:.1f}' ) print(f'Mean IOU: {miou}') print('\n---\n') def print_final(self): precisions, recalls, ious, weights, miou = self._compute_stats(self.tps, self.fps, self.fns) self._print_stats(self.cm, precisions, recalls, ious, weights, miou) def reset(self): self.cm = np.zeros((self.num_classes, self.num_classes), 'u8') self.tps = np.zeros(self.num_classes, dtype='u8') self.fps = np.zeros(self.num_classes, dtype='u8') self.fns = np.zeros(self.num_classes, dtype='u8') def evaluate(model, metrics, dataset, device, batch_size=8, verbose=True, create_imgs=False, save_dir='.'): model = model.eval().to(device) loader = tdata.DataLoader(dataset, batch_size=batch_size, shuffle=True) with torch.no_grad(): # disable gradient computation for i, batch in enumerate(loader): data = batch['rgbs'].to(device) predictions = model(data) metrics.update(batch['labels'], predictions, verbose) if create_imgs: for j, img_id in enumerate(batch['key']): img = create_vis(data[j].cpu().numpy(), batch['labels'][j].numpy(), predictions[j].cpu().numpy()) os.makedirs(save_dir, exist_ok=True) img.save(osp.join(save_dir, f'{img_id:04d}.png')) print(f'Processed {i+1:02d}th batch') metrics.print_final() return metrics def prepare(args, model=None): dataset = Dataset(args.dataset_rgbs, args.dataset_labels) if model is None: model = hw_2.load_model() metrics = Metrics(NUM_CLAZZ, WEIGHTS, CLAZZ) return model, metrics, dataset def run(args): device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model, metrics, dataset = prepare(args) evaluate(model, metrics, dataset, device, args.batch_size, args.verbose, args.create_imgs, args.store_dir) def parse_args(): parser = argparse.ArgumentParser('Evaluation demo for HW02') parser.add_argument('dataset_rgbs', help='NPY file, where dataset RGB data is stored') parser.add_argument('dataset_labels', help='NPY file, where dataset labels are stored') parser.add_argument( '-ci', '--create_imgs', default=False, action='store_true', help='Whether to create images. Warning! It will take significantly longer!' ) parser.add_argument('-sd', '--store_dir', default='.', help='Where to store images. Only valid, if create_imgs is set to True') parser.add_argument('-bs', '--batch_size', default=8, type=int, help='Batch size') parser.add_argument('-v', '--verbose', default=False, action='store_true', help='Whether to print stats of each minibatch') return parser.parse_args() def main(): args = parse_args() print(args) run(args) if __name__ == '__main__': main()