Search
CNN MNIST, Pytorch project workflow, training/validation/test set, hyperparameters, single loop learning rate selection; evaluation: accuracy, ranking, t-SNE embedding.
The basics are similar to PyTorch tutorial .
We suggest the following steps to learn pytorch:
class Data(): def __init__(self, args): # transforms transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) self.train_set = torchvision.datasets.MNIST('../data', download=True, train=True, transform=transform) self.test_set = torchvision.datasets.MNIST('../data', download=True, train=False, transform=transform) # dataloaders self.train_loader = torch.utils.data.DataLoader(self.train_set, batch_size=args.batch_size, shuffle=True, num_workers=0) self.test_loader = torch.utils.data.DataLoader(self.test_set, batch_size=args.batch_size, shuffle=True, num_workers=0) # Task: split train_set into train_set and val_set and create loaders
Dataset
DataLoader
class DataXY(torch.utils.data.Dataset): def __init__(self, X: torch.Tensor, Y: torch.Tensor, transform=None): self.X = X.to(dev) self.Y = Y.to(dev) self.transform = transform def __getitem__(self, index): x, y = self.X[index], self.Y[index] if self.transform is not None: x = self.transform(x) return x, y def __len__(self): return self.X.size(0)
# network, expect input images 28* 28 and 10 classes net = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10)) loss = nn.CrossEntropyLoss(reduction='none')
print(net)
Extend the template with the following:
history
train_loss_batch
train_acc_batch
x.detach().cpu().numpy()
val_loss
val_acc
pickle.dump
view.ipynb
Extend you visualization notebook as follows. Load the model saved at epoch 30.
- Report tSNE plot
Often (especially when trained long enough), deep models have good classification accuracy but the predictive probabilities $p(y|x)$ are not well calibrated (typically overconfident), On Calibration of Modern Neural Networks . Yet, more confident predictions tend to be more accurate. If this is the case, we will say that the classifier ranks well.
Consider that we want to use the classifier with the “reject from recognition” option. If the classifier picks the class $\hat y_i = {\rm argmax}_y p(y|x_i; \theta)$ on test point $i$, let us call $c_i = p(y|x; \theta)$ its confidence. It is desirable to reject from recognition when the classifier is not sufficiently confident, i.e. when $c_i \leq \alpha$, where $\alpha$ is a confidence threshold. We will study the dependence of the performance on the confidence threshold. Note that ordering by confidences is not the same as ordering by scores (why?).
a) Plot the absolute number of errors as a function of the threshold value $\alpha$. Since we work with a finite sample of test data, the test error rate will only change when $\alpha$ crosses one of the $c_i$ values we have. So instead of doing very small steps on the threshold and recomputing the error rate each time anew, here's a better way to do it. Sort all the confidences $c_i$ in ascending order. Let $e_i = 1$ if $\hat y_i \neq y^*_i$, i.e. we make an error and $e_i = 0$ if $\hat y_i = y^*_i$. If $c_{(i)}$ is the sorted sequence of confidences with error indicators $e_{(i)}$ then we can compute the number of errors for accepted points with threshold alpha = c_i as the sum of values $e_{(i+1)},\dots e_{(n)}$. You can compute this sum as
np.sum(e)-np.cumsum(e)
b) Plot the relative error rate of accepted points (the number or errors divided by the number of points accepted for recognition) versus the number of points rejected from recognition when the threshold is varied. If the relative error declines, the classifier is ranking well (we are rejecting erroneous points and keeping correct ones). Report the plots a), b)