Lab 5: CNN visualization & adversarial patterns

CNN visualization: deep features, attention maps. Adversarial patterns and attacks

Introduction

In this lab we will consider a CNN classifier and visualize activations and attention maps for its hidden layers, look for input patterns that maximize activations of specific neurons and see how to craft adversarial attacks fooling the network. All of these tasks share very similar techniques. We recommend you to use jupyter notebooks for this lab, as computations are relatively light and we need lots of visualization.

Setup

Model

In this lab we will use the pre-trained VGG11 CNN, which you already know from the previous lab. Load it like so

# load network
net = torch.hub.load('pytorch/vision:v0.9.0', 'vgg11', pretrained=True)
net = net.eval().to(device)
 
# we are not changing the network weights/biases in this lab
for param in net.parameters():
    param.requires_grad = False
print(net)

Data & Visualisation

For this lab we need just one image from ImageNet. We provide an image of a labrador retriever. Besides we need the class codes for the 1000 categories in ImageNet. We provide it as text file imagenet_classes.txt

You will need to set up the standard image transformation pipeline for ImageNet like so

# image to tensor transform
transform = transforms.Compose([  
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225] 
    )])

You will need to visualise tensors with images. For this you could use the function torchvision.utils.make_grid. E.g., assuming that $x$ is a tensor with images

grid = make_grid(x, nrow=10, normalize=True, padding=1)
image = grid.cpu().numpy().transpose(1, 2, 0)
plt.imshow(image)

Assignment 1 (1p)

Load the image, apply the classifier and report the top 10 classes. Visualise the input image and the transformed (resampled & normalised) tensor image. For the latter you may use the make_grid function mentioned above.

Assignment 2 (2p)

Given an input image, your task is to compute the $l_2$ norms over the activation channels for each of the 21 feature maps and to display them. In the next step you shall compute and visualise the “network attention” by computing the gradient of the loss w.r.t. this intermediate outputs.

For each layer $l=0,\ldots,20$: compute the feature map at that layer and the $l_2$ norms of the channel activations (per pixel) and display them in a tableau. E.g.

x = img_t # input image (tensor)
fig, axs = plt.subplots(nrows=6, ncols=4, figsize=(16, 20))
for (i,l) in enumerate(net.features):
    x = l.forward(x)
    f = (x.detach()**2).sum(dim=1).sqrt()[0]
    axs.flat[i].imshow(f.cpu().numpy(), cmap='jet')
    axs.flat[i].set_axis_off()
    axs.flat[i].set_title("{}:{}".format(i, l.__class__.__name__))

Compute the gradient of the network classification score for the predicted class w.r.t. the feature maps $l=0,\ldots,20$. You could achieve this by forward iterating through the feature layers as above, additionally setting x.retain_grad() and appending each $x$ to a list. Then you need to forward propagate through the rest of the network to its final output (score) and apply .backward() to compute the gradient w.r.t. the feature maps. Display a tableau with the results by showing the $l_2$ norms of the channel gradients (per pixel).

Assignment 3 (4p)

The goal of this assignment is to find input patterns that maximise the outputs of neurons in a given layer (recall the work of Hubel and Wiesel). To find these patterns, we will numerically optimize over a patch of the input image.

Implement a function receptive_field that computes the size of the receptive field for a given layer (see seminar 3).
Implement a function activation_max that will compute the patterns that maximise activations of a given layer. Start from an image x of small size, initialized with zeros. The size should be equal to the receptive field of the unit in the target layer. Forward propagate x through the network to compute the feature map y of the target layer. Select the centrally located pixel and the target channel in the feature map y. Use the Adam optimizer to maximize the selected feature (i.e. forward-backward loop with optimizer steps). Find such an activating image x for each channel of the target layer and display them in a panel. Constrain the search to patterns with all components in the range $[-1.0, 1.0]$. You can achieve this simply by clipping the pattern after each gradient step. You can speed-up the optimization by running it in parallel for all layer channels. This can be achieved by using a batch of activation images (one per target channel) along with the following “trick”:
```
 x = torch.nn.Parameter(torch.zeros(channels,3,S, S)).to(device)
```
initialises a zero tensor, where S is the size of the receptive field. If $f$ denotes the output of the considered layer, then the objective is simply
```
f[:,:,sz[2]//2, sz[3]//2].diag().sum()
```
where sz is the shape of $f$. Run the gradient ascent for a fixed number of steps. Finally, display the obtained patterns in a tableau. They are not resembling patches of natural images. We will therefore add a regularisation that enforces more realistic patterns.
We will add the following simple regularisation that enforces smooth patterns. Let $x_c$ denote a colour channel of the input pattern and $A$ denote a smoothing convolution. We want to enforce the constraint $$\lVert x_c - A x_c\rVert_1 \leq \epsilon$$ For this we will simply replace $x_c$ by $Ax_c$ after each iteration if the constraint is violated. This can be done by the following code snippet
```
with torch.no_grad():
  xx = apool(apad(x))
  diff = x - xx
  dn = torch.linalg.norm(diff.flatten(2), dim=2, ord=1.0) / (S * S)
  if dn.max() > epsilon:
  x.data[dn > epsilon] = xx[dn > epsilon]
```
where
```
apool = torch.nn.AvgPool2d(3, padding=0, stride=1)
apad = torch.nn.ReplicationPad2d(1)
```
Implement the smoothing operator $A$. Tune $\epsilon$ so that the optimal activation patterns resemble natural image patches. Finally, display the obtained patterns in a tableau.

Assignment 4 (3p)

Your task is to implement a targeted iterative adversarial attack.

Choose a clean image which is correctly classified by the net (e.g. the image of the labrador retriever)
Choose a target class different from the true class (e.g. 892: wall clock) and fix an ε > 0. Implement a projected gradient ascent that aims to maximize the softmax output of the target class w.r.t. the input image, but constrains the search to the ε-ball of the $\ell_\infty$ norm around the clean image.
- Start the optimization from the clean image.
- You may use the Adam optimizer for computing the gradient and performing the gradient step. For this you have to require gradients for the input image
- To enforce the constraint, you may e.g. use the following code after each gradient step
```
    dx = (x.detach() - x0)
    dn = dx.flatten().norm(p=float('inf'))
    div = torch.clamp(dn/eps, min=1.0)
    dx = dx / div
    x.data = x0 + dx
```
  where $x_0$ is the clean image (tensor) and $x$ is the current image (tensor).
Run the projected gradient ascent for a fixed number of steps.
Report the ε which admits a successful attack, show the obtained adversarial example along with the clean image and report the prediction probabilities for them.

Table of Contents