Search
CNN visualization: deep features, attention maps. Adversarial patterns and attacks
In this lab we will consider a CNN classifier and visualize activations and attention maps for its hidden layers, look for input patterns that maximize activations of specific neurons and see how to craft adversarial attacks fooling the network. All of these tasks share very similar techniques. We recommend you to use jupyter notebooks for this lab, as computations are relatively light and we need lots of visualization.
In this lab we will use the pre-trained VGG11 CNN, which you already know from the previous lab. Load it like so
# load network net = torch.hub.load('pytorch/vision:v0.9.0', 'vgg11', pretrained=True) net = net.eval().to(device) # we are not changing the network weights/biases in this lab for param in net.parameters(): param.requires_grad = False print(net)
For this lab we need just one image from ImageNet. We provide an image of a labrador retriever. Besides we need the class codes for the 1000 categories in ImageNet. We provide it as text file imagenet_classes.txt
You will need to set up the standard image transformation pipeline for ImageNet like so
# image to tensor transform transform = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] )])
torchvision.utils.make_grid
grid = make_grid(x, nrow=10, normalize=True, padding=1) image = grid.cpu().numpy().transpose(1, 2, 0) plt.imshow(image)
Load the image, apply the classifier and report the top 10 classes. Visualise the input image and the transformed (resampled & normalised) tensor image. For the latter you may use the make_grid function mentioned above.
make_grid
Given an input image, your task is to compute the $l_2$ norms over the activation channels for each of the 21 feature maps and to display them. In the next step you shall compute and visualise the “network attention” by computing the gradient of the loss w.r.t. this intermediate outputs.
x = img_t # input image (tensor) fig, axs = plt.subplots(nrows=6, ncols=4, figsize=(16, 20)) for (i,l) in enumerate(net.features): x = l.forward(x) f = (x.detach()**2).sum(dim=1).sqrt()[0] axs.flat[i].imshow(f.cpu().numpy(), cmap='jet') axs.flat[i].set_axis_off() axs.flat[i].set_title("{}:{}".format(i, l.__class__.__name__))
x.retain_grad()
.backward()
The goal of this assignment is to find input patterns that maximise the outputs of neurons in a given layer (recall the work of Hubel and Wiesel). To find these patterns, we will numerically optimize over a patch of the input image.
receptive_field
activation_max
x = torch.nn.Parameter(torch.zeros(channels,3,S, S)).to(device)
f[:,:,sz[2]//2, sz[3]//2].diag().sum()
with torch.no_grad(): xx = apool(apad(x)) diff = x - xx dn = torch.linalg.norm(diff.flatten(2), dim=2, ord=1.0) / (S * S) if dn.max() > epsilon: x.data[dn > epsilon] = xx[dn > epsilon]
apool = torch.nn.AvgPool2d(3, padding=0, stride=1) apad = torch.nn.ReplicationPad2d(1)
Your task is to implement a targeted iterative adversarial attack.
labrador retriever
dx = (x.detach() - x0) dn = dx.flatten().norm() div = torch.clamp(dn/eps, min=1.0) dx = dx / div x.data = x0 + dx