Graph Neural Networks with PyTorch

Graph Neural Networks (GNNs) represent a powerful class of machine learning models tailored for interpreting data described by graphs. This is particularly useful because many real-world structures are networks composed of interconnected elements, such as social networks, molecular structures, and communication systems. In this article, we will see how we can use Pytorch for building graph neural networks.

Implementation of a Simple GNN Model using PyTorch

Implementing Graph Neural Networks (GNNs) with the CORA dataset in PyTorch, specifically using PyTorch Geometric (PyG), involves several steps. Here's a guide through the process, including code snippets for each step.

Step 1: Loading the CORA Dataset

The CORA dataset is a citation graph where nodes represent documents, and edges represent citation links. Each document is classified into one of seven categories, making it a popular dataset for node classification tasks in GNNs.

First, ensure you have PyTorch Geometric installed. If not, you can install it using pip:

pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric

Then, load the CORA dataset:

import torch
from torch_geometric.data import Planetoid
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

# Loading the Cora dataset
dataset = Planetoid(root='data/Planetoid', name='Cora')

Step 2: Defining a GNN Model Using PyTorch

We'll define a simple GNN model using one of the most straightforward types of GNN layers, the Graph Convolutional Network (GCN) layer, provided by PyTorch Geometric.

A custom Graph Neural Network (GNN) model is built using PyTorch's `torch.nn.Module` class. The model consists of two Graph Convolutional Network (GCN) layers, each followed by a Rectified Linear Unit (ReLU) activation function and dropout regularization. The model's `forward` method takes feature data and edge information as input, applies the defined layers sequentially, and outputs a log-softmax activation for classification. Additionally, an Adam optimizer is initialized to train the model with a specified learning rate and weight decay.

class CustomGNN(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(CustomGNN, self).__init__()
        self.layer1 = GCNConv(input_dim, hidden_dim)
        self.layer2 = GCNConv(hidden_dim, output_dim)

    def forward(self, feature_data, edge_info):
        # First GCN layer
        x = self.layer1(feature_data, edge_info)
        x = F.relu(x)
        x = F.dropout(x, p=0.5, training=self.training)
        # Second GCN layer
        x = self.layer2(x, edge_info)
        return F.log_softmax(x, dim=1)

# Initialize the GNN model
input_features = dataset.num_node_features
num_classes = dataset.num_classes
model = CustomGNN(input_features, 16, num_classes)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

graph_data = dataset[0]  # Get the graph data

Step 3: Training the GNN Model on the CORA Dataset

Now, let's train the model. This involves initializing the model, defining the optimizer, and running the training loop.

def train_model():
    model.train()
    optimizer.zero_grad()
    output = model(graph_data.x, graph_data.edge_index)
    loss = F.nll_loss(output[graph_data.train_mask], graph_data.y[graph_data.train_mask])
    loss.backward()
    optimizer.step()
    return loss.item()

for epoch in range(200):
    loss_value = train_model()
    print(f'Epoch: {epoch+1:03d}, Loss: {loss_value:.4f}')

Step 4: Evaluating the Model's Performance

This code defines a basic GNN model, trains it on the CORA dataset, and evaluates its accuracy. The model architecture and training parameters are kept simple for demonstration purposes. In practice, you may want to experiment with deeper models, different types of GNN layers, and other optimization techniques to improve performance.

def evaluate_model():
    model.eval()
    with torch.no_grad():
        predictions = model(graph_data.x, graph_data.edge_index).argmax(dim=1)
        correct = (predictions[graph_data.test_mask] == graph_data.y[graph_data.test_mask]).sum()
        acc = int(correct) / int(graph_data.test_mask.sum())
    return acc

accuracy = evaluate_model()
print(f'Test Accuracy: {accuracy:.4f}')

Complete Code to Implement Simple GNN using PyTorch

Python

!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric

import torch
from torch_geometric.data import Planetoid
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

# Loading the Cora dataset
dataset = Planetoid(root='data/Planetoid', name='Cora')
dataset = Planetoid(root='/tmp/Cora', name='Cora')

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class CustomGNN(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(CustomGNN, self).__init__()
        self.layer1 = GCNConv(input_dim, hidden_dim)
        self.layer2 = GCNConv(hidden_dim, output_dim)

    def forward(self, feature_data, edge_info):
        # First GCN layer
        x = self.layer1(feature_data, edge_info)
        x = F.relu(x)
        x = F.dropout(x, p=0.5, training=self.training)
        # Second GCN layer
        x = self.layer2(x, edge_info)
        return F.log_softmax(x, dim=1)

# Initialize the GNN model
input_features = dataset.num_node_features
num_classes = dataset.num_classes
model = CustomGNN(input_features, 16, num_classes)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

graph_data = dataset[0]  # Get the graph data

def train_model():
    model.train()
    optimizer.zero_grad()
    output = model(graph_data.x, graph_data.edge_index)
    loss = F.nll_loss(output[graph_data.train_mask], graph_data.y[graph_data.train_mask])
    loss.backward()
    optimizer.step()
    return loss.item()

for epoch in range(200):
    loss_value = train_model()
    print(f'Epoch: {epoch+1:03d}, Loss: {loss_value:.4f}')

def evaluate_model():
    model.eval()
    with torch.no_grad():
        predictions = model(graph_data.x, graph_data.edge_index).argmax(dim=1)
        correct = (predictions[graph_data.test_mask] == graph_data.y[graph_data.test_mask]).sum()
        acc = int(correct) / int(graph_data.test_mask.sum())
    return acc

accuracy = evaluate_model()
print(f'Test Accuracy: {accuracy:.4f}')

Output:

Epoch: 1, Loss: 1.9575103521347046
Epoch: 2, Loss: 1.8464752435684204
Epoch: 3, Loss: 1.7240716218948364
...
Epoch: 197, Loss: 0.01653033308684826
Epoch: 198, Loss: 0.021068871021270752
Epoch: 199, Loss: 0.01990758441388607
Epoch: 200, Loss: 0.029717298224568367
Test Accuracy: 0.811

The output shows the loss value decreasing over 200 epochs of training a Graph Neural Network (GNN) on the CORA dataset, indicating that the model is learning to classify nodes more accurately over time. The final test accuracy of 0.811 demonstrates that the trained GNN model can correctly predict the class of over 81% of the nodes in the test set, showcasing its effectiveness in node classification tasks within graph-structured data.

Article Tags :

AI-ML-DS

Deep Learning

AI-ML-DS With Python

Python-PyTorch