Lab 4: CNN visualization & adversarial patterns

CNN visualization: deep features, attention maps. Adversarial patterns and attacks

Introduction

In this lab we will consider a CNN classifier and visualize activations and attention maps for its hidden layers, look for input patterns that maximize activations of specific neurons and see how to craft adversarial attacks fooling the network. All of these tasks share very similar techniques. For this lab, we recommend you to use jupyter notebooks, as computations are relatively light and we need lots of visualization.

Setup

Model

In this lab we will use the pre-trained VGG11 CNN, which you already know from the previous lab. Load it like so

# load network
net = torch.hub.load('pytorch/vision:v0.9.0', 'vgg11', pretrained=True)
net = net.eval().to(device)
 
# we are not changing the network weights/biases in this lab
for param in net.parameters():
    param.requires_grad = False
print(net)

Data

For this lab we need just one image from ImageNet. We provide an image of a labrador retriever. Besides we need the class codes for the 1000 categories in ImageNet. We provide it as text file imagenet_classes.txt

Infrastructure

You will need to set up the standard image transformation pipeline for ImageNet like so

# image to tensor transform
transform = transforms.Compose([  
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225] 
    )])

You will need to visualise tensors with images. For this you could use the function torchvision.utils.make_grid. E.g., assuming that $x$ is a tensor with images

grid = make_grid(x, nrow=10, normalize=True, padding=1)
image = grid.cpu().numpy().transpose(1, 2, 0)
plt.imshow(image)

Assignment 1 (1p)

Load the image, apply the classifier and report the top 10 classes. Visualise the input image and the transformed (resampled & normalised) tensor image. For the latter you may use the make_grid function mentioned above.

Assignment 2 (2p)

Given an input image, your task is to compute the $l_2$ norms over the activation channels for each of the 21 feature maps and to display them. In the next step you shall compute and visualise the “network attention” by computing the gradient of the loss w.r.t. this intermediate outputs.

For a target layer $l=0,\ldots,20$ compute the feature map at that layer and the $l_2$ norms of the channel activations (per pixel) and display them in a tableau. E.g.

x = img_t # input image (tensor)
fig, axs = plt.subplots(nrows=6, ncols=4, figsize=(16, 20))
for (i,l) in enumerate(net.features):
    x = l.forward(x)
    f = (x.detach()**2).sum(dim=1).sqrt()[0]
    axs.flat[i].imshow(f.cpu().numpy(), cmap='jet')
    axs.flat[i].set_axis_off()
    axs.flat[i].set_title("{}:{}".format(i, l.__class__.__name__))

Compute the the gradient of of the network classification score for the predicted class w.r.t. this feature map. You’ll need to detach the feature map from the graph and set requires_grad=True. Then you need to forward propagate the network to its final output (score). Finally, you may use .backward() to compute the gradient w.r.t. the feature map of interest. Display a tableau with the results by showing the $l_2$ norms of the channel gradients (per pixel).

Assignment 3 (4p)

The goal of this assignment is to find input patterns that maximise the outputs of a given layer (recall the work of Hubel and Wiesel). To find the input that maximises the activation of an unit in that layer, we will numerically optimize over a patch of the input image.

Implement a function activation_max that will compute the patterns that maximise activations of a given layer. Start from an image x of small size, initialized with zeros. The size should be equal to the receptive field of the unit in the target layer (see function receptive_field below). Forward propagate x through the network to compute the feature map y of the target layer. Select the centrally located pixel and the target channel in the feature map y. Use the Adam optimizer to maximize the selected feature (i.e. forward-backward loop with optimizer steps). Find such an activating image x for each channel of the target layer and display them in a panel. Constrain the search to patterns with all components in the range $[-1.0, 1.0]$. You can achieve this simply by clipping the pattern after each gradient step. If you like to speed-up the optimization, it can be done in parallel over a batch of activation images, one per target channel. This can be achieved by the following “trick”
```
x = torch.nn.Parameter(torch.zeros(channels,3,S, S)).to(device)
```
initialises a zero tensor, where S is the size of the receptive field. If $f$ denotes the output of the considered layer, then the objective is simply
```
-f[:,:,sz[2]//2, sz[3]//2].diag().sum()
```
where sz is the shape of $f$. Run the gradient ascent for a fixed number of steps. Finally, display the obtained patterns in a tableau. They are not resembling patches of natural images. We will therefore add a regularisation that enforces more realistic patterns.
We will add the following simple regularisation that enforces smooth patterns. Let $x_c$ denote a colour channel of the input pattern and $A$ denote a smoothing convolution. We want to enforce the constraint $$\lVert x_c - A x_c\rVert_1 \leq \epsilon$$ For this we will simply replace $x_c$ by $Ax_c$ after each iteration if the constraint is violated. This can be done by the following code snippet
```
with torch.no_grad():
  xx = apool(apad(x))
  diff = x - xx
  dn = torch.linalg.norm(diff.flatten(2), dim=2, ord=1.0) / (S * S)
  if dn.max() > epsilon:
  x.data[dn > epsilon] = xx[dn > epsilon]
```
where
```
apool = torch.nn.AvgPool2d(3, padding=0, stride=1)
apad = torch.nn.ReplicationPad2d(1)
```
implement the smoothing operator $A$. Tune $\epsilon$ so that the optimal activation patterns resemble natural image patches. Finally, display the obtained patterns in a tableau.

Infrastructure

The function receptive_field computes the size of the receptive field for a given layer

def receptive_field(layer):
    rsize = 1
    rstride = 1
    for i in range(layer+1):
        l = net.features[i]
        if isinstance(l, torch.nn.Conv2d):
            rsize = rsize + (l.weight.size(2) - 1) * rstride
        if isinstance(l, torch.nn.MaxPool2d):
            rsize = rsize + rstride 
            rstride *= 2
    return rsize

Assignment 4 (3p)

You task is to implement a targeted iterative adversarial attack.

Choose a clean image which is correctly classified by the net (e.g. the image of the labrador retriever)
Choose a target class different from the true class (e.g. 892: wall clock) and fix an ε > 0. Implement a projected gradient ascent that aims to maximize the softmax output of the target class w.r.t. the input image, but constrains the search to the ε-ball around the clean image.
- Start the optimization from the clean image.
- You may use the Adam optimizer for computing the gradient and performing the gradient step. For this you have to require gradients for the input image
- To enforce the constraint, you may e.g. use the following code after each gradient step
```
    dx = (x.detach() - x0)
    dn = dx.flatten().norm()
    div = torch.clamp(dn/eps, min=1.0)
    dx = dx / div
    x.data = x0 + dx
```
  where $x_0$ is the clean image (tensor) and $x$ is the current image (tensor).
Run the projected gradient ascent for a fixed number of steps.
Report the ε which admits a successful attack, show the obtained adversarial example along with the clean image and report the prediction probabilities for them.

Table of Contents

Lab 4: CNN visualization & adversarial patterns

Introduction

Setup

Model

Data

Infrastructure

Assignment 1 (1p)

Assignment 2 (2p)

Assignment 3 (4p)

Infrastructure

Assignment 4 (3p)