CNN MNIST, Pytorch project workflow, training/validation/test set, hyperparameters, single loop learning rate selection; evaluation: accuracy, ranking, t-SNE embedding.
The basics are similar to PyTorch tutorial .
We suggest the following steps to learn pytorch:
class Data(): def __init__(self, args): # transforms transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) self.train_set = torchvision.datasets.MNIST('../data', download=True, train=True, transform=transform) self.test_set = torchvision.datasets.MNIST('../data', download=True, train=False, transform=transform) # dataloaders self.train_loader = torch.utils.data.DataLoader(self.train_set, batch_size=args.batch_size, shuffle=True, num_workers=0) self.test_loader = torch.utils.data.DataLoader(self.test_set, batch_size=args.batch_size, shuffle=True, num_workers=0) # Task: split train_set into train_set and val_set and create loadersThe
Dataset
class is responsible for knowing how to access the data and DataLoader
is responsible for shuffling, (parallel) loading, and batching. Writing own Dataset class is rather simple, e.g. class DataXY(torch.utils.data.Dataset): def __init__(self, X: torch.Tensor, Y: torch.Tensor, transform=None): self.X = X.to(dev) self.Y = Y.to(dev) self.transform = transform def __getitem__(self, index): x, y = self.X[index], self.Y[index] if self.transform is not None: x = self.transform(x) return x, y def __len__(self): return self.X.size(0)
# network, expect input images 28* 28 and 10 classes net = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10)) loss = nn.CrossEntropyLoss(reduction='none')
print(net)
. Note, this is a very small dataset, with small input images. Do not use a full-blown architecture such as VGG that we considered in the lecture. Invent a small convolutional architecture of your own or get inspiration from the famous LeNet5 model.
Extend the template with the following:
history
dictionary with numpy arrays train_loss_batch
, train_acc_batch
and log there the training loss and training accuracy for each batch. These we will save for further processing at the visualization time. Pytorch tensors can be converted to numpy arrays using x.detach().cpu().numpy()
val_loss
, val_acc
numpy arrays in the history
dict. Unlike training metrics, validation metrics are measured and recorded once per epoch. Save the history
dict using pickle.dump
at the end of each epoch. The file name can be the string containing all arguments passed to the program, like “–lr 0.01 –optimizer Adam”. This would allow you to visualize the current learning progress and compare several runs.
view.ipynb
notebook. Load there your saved history
and make the following inline plots using matplotlib:
Extend you visualization notebook as follows. Load the model saved at epoch 30.
- Report tSNE plot
Often (especially when trained long enough), deep models have good classification accuracy but the predictive probabilities $p(y|x)$ are not well calibrated (typically overconfident), On Calibration of Modern Neural Networks . Yet, more confident predictions tend to be more accurate. If this is the case, we will say that the classifier ranks well.
Consider that we want to use the classifier with the “reject from recognition” option. If the classifier picks the class $\hat y_i = {\rm argmax}_y p(y|x_i; \theta)$ on test point $i$, let us call $c_i = p(y|x; \theta)$ its confidence. It is desirable to reject from recognition when the classifier is not sufficiently confident, i.e. when $c_i \leq \alpha$, where $\alpha$ is a confidence threshold. We will study the dependence of the performance on the confidence threshold. Note that ordering by confidences is not the same as ordering by scores (why?).
a) Plot the absolute number of errors as a function of the threshold value $\alpha$. Since we work with a finite sample of test data, the test error rate will only change when $\alpha$ crosses one of the $c_i$ values we have. So instead of doing very small steps on the threshold and recomputing the error rate each time anew, here's a better way to do it. Sort all the confidences $c_i$ in ascending order. Let $e_i = 1$ if $\hat y_i \neq y^*_i$, i.e. we make an error and $e_i = 0$ if $\hat y_i = y^*_i$. If $c_{(i)}$ is the sorted sequence of confidences with error indicators $e_{(i)}$ then we can compute the number of errors for accepted points with threshold alpha = c_i as the sum of values $e_{(i+1)},\dots e_{(n)}$. You can compute this sum as
np.sum(e)-np.cumsum(e)Set the range of the threshold from the minimum to maximum $c_i$. This plot is an intermediate result.
b) Plot the relative error rate of accepted points (the number or errors divided by the number of points accepted for recognition) versus the number of points rejected from recognition when the threshold is varied. If the relative error declines, the classifier is ranking well (we are rejecting erroneous points and keeping correct ones). Report the plots a), b)