Lab 7: Graph Neural Networks

Graph neural networks, graph convolutional networks (GCNs), laplacian eigenvector positional encodings.

Templates

Use the provided template. The package contains:

data.py — helper functions for working with datasets.
lab.py — template to start with.

Introduction

In this lab, we will use the graph convolutional networks to classify nodes of a graph (parts 1-3) and whole graphs (part 4).

Part 1: Node classification using an MLP (2p)

In this and the following two parts, we will use the Cora dataset, which is given as an undirected graph where each node represents a machine learning paper, while edges between them represent citations. Each node is associated with a feature vector representing each paper's bag-of-words representation. The task is then to classify each node (paper) into one of seven categories depending on the paper's topic.

The dataset can be accessed using the get_cora function from data.py, which returns an object of the Data class from the PyTorch Geometric library for working with graph data. Data object contains three attributes that will be useful during this lab:

x contains features of all nodes as a matrix
edge_index contains edges between nodes as a matrix where each column represents the source and destination index of the edge
y contains ground truth labels of all nodes.

Additionally, you will use the generate_cora_split function from data.py to generate a random train/val/test split of the Cora dataset.

Implement an MLP model with num_layers of linear layers followed by a ReLU activation and dropout (do not put ReLU and dropout after the last layer).
Train an MLP on the training set of the Cora dataset using the cross entropy loss. Use a batch size equal to the whole training set (gradient descent). Experiment with the hyper-parameters to obtain the best performance on the validation set. For the best trained model, report the performance on the test set averaged over 10 seeds, together with the standard deviation.

Note: For MLP training you should use only the node features while completely disregarding graph structure (edge_index).

Part 2: Node classification using a GCN (2pt)

In the first part of the lab, we used an MLP that completely disregards the graph structure. We will now use a GCN model that respects the graph structure. A helpful class to implement this will be a GCNConv from PyTorch Geometric.

Implement a GCN model with num_layers of GCNConv layers followed by a ReLU activation and dropout (do not put ReLU and dropout after the last layer).
Train a GCN on the training set of the Cora dataset using the cross entropy loss. Use a batch size equal to the whole training set (gradient descent). Experiment with the hyper-parameters to obtain the best performance on the validation set. For the best trained model, report the performance on the test set averaged over 10 seeds, together with the standard deviation.
How does the performance compare with the trained MLP model?

Note: For GCN training you should pass the whole graph as the input (nodes from all splits and all the edges). Then loss calculation should only be done for the training nodes.

Part 3: Positional encodings based on eigenvalues of a graph Laplacian (3pt)

Nodes of a graph do not have a meaningful absolute ordering; however, we can look for a relative ordering that looks into where a node is located relative to others. This positional encoding can help the model understand proximity, hierarchy, and connectivity. Since graphs vary widely in structure—ranging from simple social networks to complex molecular structures—designing effective positional encodings is a challenging and active area of research. You can find a short overview of the existing graph positional encodings at the following link.

In this lab, we will look into one positional encoding that provides a node's position concerning the entire graph. In this case, nodes that are closer in the graph structure will have similar positional encodings. An example of such positional encoding is to use eigenvectors of the Laplacian matrix of the graph. These eigenvectors capture the spectral properties of the graph. To do this, we first construct a graph Laplacian as $L = I - D^{-1/2} A D^{-1/2}$ where $A$ and $D$ define the graph adjacency matrix and degree matrix (diagonal matrix with values equal to the sum of rows/columns of $A$), respectively. We then calculate eigenvalues $\lambda_i$ and eigenvectors $f_i$ of $L$. Assuming that eigenvalues $\lambda_i$ are sorted as $\lambda_0 \leq \lambda_1 \leq \cdots \leq \lambda_N$ we will use first $k$ eigenvectors (skipping the first one $f_0$) as positional encodings ${f_1, \cdots, f_k}$.

For example, in the following simple graph, if we compute the top 2 eigenvectors as positional encodings, we obtain something like the following. We can see that positional encodings are able to capture the graph structure and assign similar positional encodings for nodes that are tightly connected.

Implement add_laplacian_eigenvector_pe that will compute positional encodings based on eigenvectors of the Laplacian and concatenate them with existing features. Useful functions might be get_laplacian and to_scipy_sparse_matrix from PyTorch Geometric. For an efficient implementation of eigenvector calculation, use eigsh from CuPy.
Reproduce the results for the simple toy graph shown in the previous images (use $k=2$). You can get the toy graph data using the get_toy function from data.py. Can you understand the meaning of values contained in the eigenvector $f_0$ corresponding to eigenvalue $\lambda_0$ that we are not using? Hint: Think if values are proportional to some property of a node.
Add positional encoding to the Cora dataset and train an MLP and a GCN on these new features (positional encodings concatenated with original features). What is the performance? How much did the positional encodings impact the performance in each case?

Part 4: Graph classification (3pt)

In the previous parts, we classified each node of a single graph; however, we will now look into how to classify whole graphs. For this, we will work with the MolHIV dataset from Open Graph Benchmark. MolHIV is a molecular dataset for predicting whether a molecule inhibits HIV replication. Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds. MolHIV uses the area under the ROC curve as a metric.

You can access the dataset using the get_molhiv function from data.py. The function will return dataloaders for train/validation/test splits and additional helpful properties. Objects returned by the dataloader are of the Batch class from PyTorch Geometric that implements batching for graphs (PyTorch Geometric combines the graphs into a single disconnected graph data object). Batch inherits from Data (used in previous parts) and contains an additional attribute called batch. The batch attribute is a vector mapping each node to the index of its corresponding graph within the mini-batch.

Implement the forward pass of the GCN_Graph class. Note that the polling function should be one of the global pooling operations from PyTorch Geometric.
Use the provided train_graph and eval_graph functions to train a GCN_Graph model using MLP and GCN as the node embedding models. Good default hyper-parameters for training using the Adam optimizer are given in DEFAULT_ARGS.
Try optimizing the hyper-parameters and using different pooling functions. Which model obtained the best performance? Which node embedding model? Which pooling function?

Necessary libraries

torch_geometric
cupy
ogb

Table of Contents