Search
Pytorch, project pipeline, training/validation/test set, model selection (architecture), overfitting, early stopping, CNN MNIST Visualize ranking, t-SNE embedding.
What is PyTorch: python front end, C++ libraries (A10), Target devices libraries (cuDNN). These will be useful resources for this lab:
We suggest the following steps to learn pytorch:
numpy.array
import matplotlib.pyplot as plt import numpy as np import torch import torchvision import torchvision.transforms as transforms import torch.nn as nn import torch.nn.functional as F import torch.optim as optim # transforms transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) # datasets trainset = torchvision.datasets.MNIST('./data', download=True, train=True, transform=transform) # dataloaders train_loader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=0) # lets verify how the loader packs the data (data, target) = next(iter(train_loader)) # probably get [batch_size x 1 x 28 x 28] print('Input size:', data.size()) # probably get [batch_size] print('Labels size:', target.size()) # see number of trainig data: n_train_data = len(trainset) print('Train data size:', n_train_data)
# network, expect input images 28* 28 and 10 classes net = nn.Sequential(nn.Linear(28 * 28, 10)) # loss function loss = nn.CrossEntropyLoss(reduction='none') # optimizer optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) for epoch in range(10): # will accumulate total loss over the dataset L = 0 # loop fetching a mini-batch of data at each iteration for i, (data, target) in enumerate(train_loader): # flatten the data size to [batch_size x 784] data_vectors = data.flatten(start_dim=1) # apply the network y = net.forward(data_vectors) # calculate mini-batch losses l = loss(y, target) # accumulate the total loss as a regular float number (important to sop graph tracking) L += l.sum().item() # the gradient usually accumulates, need to clear explicitly optimizer.zero_grad() # compute the gradient from the mini-batch loss l.mean().backward() # make the optimization step optimizer.step() print(f'Epoch: {epoch} mean loss: {L / n_train_data}')
print(net)
Extend the blank above with the following:
history
train_loss
train_acc
val_loss
val_acc
val_loss[:,i]
pickle.dump
torch.save(net.state_dict(), PATH)
Please annotate axis and legend.
dev = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') ... net.to(dev) ... data = data.to(dev)
Note the different syntax of moving to device a Tensor and a Model.
Tensor
Model
Extend you visualization notebook as follows. Load the model at epoch 100.
testset = torchvision.datasets.MNIST('./data', download=True, train=False, transform=transform) test_loader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=0)
For calculations in this and next assignments use numpy. Pytorch tensors can be converted to numpy arrays using x.cpu().numpy()
x.cpu().numpy()
- Report tSNE plot
Consider that we are allowed to use “reject from recognition” option. If the classifier picks the class $\hat y_i = {\rm argmax}_y p(y|x_i; \theta)$ on test point $i$, let us call $c_i = p(y|x; \theta)$ its confidence. We will want to reject from recognition when we are not confident, i.e. when $c_i \leq \alpha$, where $\alpha$ is a confidence threshold. We will not decide this threshold, but study the performance for all possible thresholds.
a) Plot the number of errors as a function of the threshold value $\alpha$. Since we work with a finite sample of test data, the test error rate will only change when $\alpha$ crosses one of the $c_i$ values we have. So instead of doing very small steps on the threshold and recomputing the error rate each time anew, here's a better way to do it. Sort all the confidences $c_i$ in ascending order. Let $e_i = 1$ if $\hat y_i \neq y^*_i$, i.e. we make an error and $e_i = 0$ if $\hat y_i = y^*_i$. If $c_{(i)}$ is the sorted sequence of confidences with error indicators $e_{(i)}$ then we can compute the number of errors for accepted points with threshold alpha = c_i as the sum of values $e_{(i+1)},\dots e_{(n)}$. You can compute this sum as
np.sum(e)-np.cumsum(e)
b) Plot the number of points rejected from recognition as a function of the threshold value $\alpha$. For this we need to just plot values $1$ to $n$ versus the sorted array $c_{(i)}$.
c) Plot the error rate of accepted points (number of errors versus number of points accepted for recognition). This just combines the data from a) and b). If the relative error declines, the classifier is ranking well (we are rejecting erroneous points and keeping correct ones).
Report plots a), b), c)