Search
Gaussian Variational Autoencoders
In this lab we will consider vanilla Gaussian VAEs (see lecture 11) and train them to generate MNIST images. The goal is to analyse whether the generative ability of VAEs increases with the complexity of the networks used for encoding and decoding. The baseline VAE will have both the decoder and encoder implemented by networks with one fully connected layer only (i.e. without hidden layers). The extended variant will have the decoder and encoder implemented as multilayer FFNs. The latent representation space will be the same for both variants.
We recommend you the paper “Tutorial on Variational Autoencoders” by C. Doersch arXiv:1606.05908 for additional reading.
1. The space of MNIST images is $\mathcal{X} = \mathbb{R}^{28\times 28}$. The latent space is denoted as $\mathcal{Z} = \mathbb{R}^m$.
2. The decoder $d_\theta(z)$ maps $z \mapsto \mu_\theta(z) \in \mathcal{X}$ and the related probability distribution $p_\theta(x | z)$ is $\mathcal{N}(\mu_\theta(z), \sigma^2\mathbb{I})$, where we assume that the scalar $\sigma$ is fixed.
3. The encoder $e_\varphi(x)$ maps $x \mapsto (\mu_\varphi(x), \sigma_\varphi(x)) \in (\mathcal{Z}, \mathcal{Z})$ and the related probability distribution $q_\varphi(z | x)$ is $\mathcal{N}\bigl(\mu_\varphi(x), \mathrm{diag}(\sigma_\varphi^2(x))\bigr)$.
1. Implement the FFN encoder and decoder as PyTorch Module containers. E.g. the baseline encoder like so
class Encoder(nn.Module): def __init__(self, zdim): super(Encoder, self).__init__() self.zdim = zdim # construct the body body_list = [] bl = nn.Linear(784, self.zdim * 2) body_list.append(bl) self.body = nn.Sequential(*body_list) def forward(self, x): scores = self.body(x) mu, sigma = torch.split(scores, self.zdim, dim=1) sigma = torch.exp(sigma) return mu, sigma
class Decoder(nn.Module): def __init__(self, zdim): super(Decoder, self).__init__() # construct the body body_list = [] bl = nn.Linear(zdim, 784) body_list.append(bl) self.body = nn.Sequential(*body_list) def forward(self, x): mu = self.body(x) return mu
2. Implement the learning step for the VAE. Thanks to the PyTorch developer community, this is pretty easy if you use torch.distributions. Below we show usefull code-snippets
torch.distributions
# initialise a tensor of normal distributions from tensors z_mu, z_sigma qz = torch.distributions.Normal(z_mu, z_sigma) # compute log-probabilities for a tensor z logz = qz.log_prob(z) # sample from the distributions with re-parametrisation zs = qz.rsample() # compute KL divergence for two tensors with probability distributions kl_div = torch.distributions.kl_divergence(qz, pz)
optimizer = torch.optim.Adam(list(encoder.parameters()) + list(decoder.parameters()), lr=stepsize)
Train the baseline VAE and the deeper VAE on MNIST data. Recall that the dimension of the latent space should be the same for both models. For each of the models report the following
sum(p.numel() for p in encoder.parameters() if p.requires_grad)
The goal of this assignment is to compare the performance of the two models. Unfortunately, it is not possible to quantify the performance of generative models like VAEs in terms of training data log-likelihood because its estimation is not tractable. The paper arXiv:1802.03446lists and discusses 24 different surrogate metrics. Here instead, we will analyse the trained VAEs quantitatively and qualitatively.
arXiv:1802.03446
decode
sample
encode