Search
Quick links: Forum | BRUTE | Lectures | Labs
CNN deep features visualization. Attention maps. Adversarial patterns and attacks.
In this lab we will consider a CNN classifier and visualize activations and attention maps for its hidden layers. We will look for input patterns that maximize activations of specific neurons and see how to craft adversarial attacks fooling the network. All of these tasks share very similar techniques. We recommend you to use Jupyter notebooks for this lab, as computations are relatively light and we need lots of visualization.
In this lab we will use the pre-trained VGG11 CNN, which you already know from the previous lab. Load it like so
# load network net = torch.hub.load('pytorch/vision:v0.9.0', 'vgg11', pretrained=True) net = net.eval().to(device) # we are not changing the network weights/biases in this lab for param in net.parameters(): param.requires_grad = False print(net)
For this lab we need just a few images from ImageNet. We provide an image of a labrador retriever. Choose one or two additional images of your own so, that they demonstrate well the effects studied below. Besides we need the class codes for the 1000 categories in ImageNet. We provide it as text file imagenet_classes.txt
You will need to set up the standard image transformation pipeline for ImageNet:
# image to tensor transform transform = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] )])
We will first try to understand the trained network by visualising the learned representation:
layer.__class__.__name__
x.retain_grad()
.backward()
What conclusions are you able to make out of these visualisations? How informative the outputs are. Can you think of a better visualisation? If you have surplus, you may also try some of the GradCAM visualization variants.
Next, we will try to find input patterns that maximise the outputs of a particular neuron in a given layer (recall the work of Hubel and Wiesel –part 1, part 2). Choose one of the following methods (or do both if you like):
Experiment with different layers and channels and try to find interesting relations to the input.
We will ask you the same question at the end, but without knowing much yet, which approach do you think will produce better insight into the learned representation?
For both methods, you will need to implement a function receptive_field that computes the size of the receptive field for a given layer (see seminar 2). Use this function to find the relevant patch size.
receptive_field
As the first approach is little technical, we provide the following basic template algorithm:
You can speed-up the optimization by running it in parallel for all layer channels. This can be achieved by using a batch of activation images (one per target channel) along with the following “trick”:
x = torch.nn.Parameter(torch.zeros(channels, 3, S, S)).to(device)
f[:,:,sz[2]//2, sz[3]//2].diag().sum()
You will most likely arrive at patches not resembling patches of natural images. Could you explain what is happening?
One way to enforce more realistic patterns is to add smoothness regularisation (natural patches are more smooth on average). Let $x_c$ denote a colour channel of the input pattern and $A$ denote a smoothing convolution. We want to enforce the constraint $$\lVert x_c - A x_c\rVert_1 \leq \epsilon\;.$$ For this we will simply replace $x_c$ by $Ax_c$ after each iteration if the constraint is violated. This can be done by the following code snippet
with torch.no_grad(): xx = apool(apad(x)) diff = x - xx dn = torch.linalg.norm(diff.flatten(2), dim=2, ord=1.0) / (S * S) if dn.max() > epsilon: x.data[dn > epsilon] = xx[dn > epsilon]
apool = torch.nn.AvgPool2d(3, padding=0, stride=1) apad = torch.nn.ReplicationPad2d(1)
The images look better, but are still not very realistic. Can you explain what is happening? Does the method work well for all layers?
So, once again, which of the two approaches is better for studying the learned representation? Why?
Your task is to implement a targeted iterative adversarial attack.
labrador retriever
dx = (x.detach() - x0) dn = dx.flatten().norm(p=float('inf')) div = torch.clamp(dn/eps, min=1.0) dx = dx / div x.data = x0 + dx