图神经网络：(图的分类)在MUTAG数据集上动手实现图神经网络网站首页 学无止境

图神经网络：(图的分类)在MUTAG数据集上动手实现图神经网络

Q天马A行空Q 2024-06-17 10:22:13

简介图神经网络：(图的分类)在MUTAG数据集上动手实现图神经网络

文章说明：
1)参考资料：PYG官方文档。超链。
2)博主水平不高，如有错误还望批评指正。
3)我在百度网盘上传了这篇文章的jupyter notebook。超链。提取码8848。

文章目录

MUTAG数据集说明

MUTAG数据集是一个常用分子图形数据集。该数据集包含188个分子(也就是说有188张图并且每个图为无向无权非自循环)，每个分子包含一个二元标签表示该分子是否为一种类固醇化合物。特征向量具体含义未知但是不用关注。PS：猜测特征向量应为：独热编码具体来说是否为某原子。这里边有很多信息但为了简单是我们并不使用，例如：化学键具体是什么。
导库

from torch_geometric.datasets import TUDataset
import torch

导数据集

dataset=TUDataset(root='/DATA/TUDataset',name='MUTAG')

打乱顺序，训测拆分

dataset=dataset.shuffle()
train_dataset=dataset[:150]
test_dataset=dataset[150:]

我们下面观察数据

图的小批量处理法

在这里插入图片描述
为充分利用GPU，我们使用如上方式——1）创建一个包含多个孤立图的超巨型图，2）特征矩阵简单连接。如上。有如下的优点：1）不同图之间不会进行信息的传递2）稀疏矩阵保存不会占用内存。
导库

from torch_geometric.loader import DataLoader

观察数据

train_loader=DataLoader(train_dataset,batch_size=64,shuffle=True)
test_loader=DataLoader(test_dataset,batch_size=64,shuffle=False)
for step,data in enumerate(train_loader):
    print(f'Step {step + 1}:')
    print('=======')
    print(f'Number of graphs in the current batch: {data.num_graphs}')
    print(data)
    print()
#输出如下：
#Step 1:
#=======
#Number of graphs in the current batch: 64
#DataBatch(edge_index=[2, 2626], x=[1187, 7], edge_attr=[2626, 4], y=[64], batch=[1187], ptr=[65])

#Step 2:
#=======
#Number of graphs in the current batch: 64
#DataBatch(edge_index=[2, 2448], x=[1107, 7], edge_attr=[2448, 4], y=[64], batch=[1107], ptr=[65])

#Step 3:
#=======
#Number of graphs in the current batch: 22
#DataBatch(edge_index=[2, 978], x=[441, 7], edge_attr=[978, 4], y=[22], batch=[441], ptr=[23])

图分类的基本流程

1.通过多轮信息传递嵌入节点
2.聚合嵌入节点为嵌入图
3.训练一个嵌入图分类器
PS：第二步一般使用如下的公式： $mathcal{X_{mathcal{G}}}=frac{1}{|mathcal{V}|}sum_{mathcal{v} in mathcal{V}}mathcal{x}_{mathcal{v}}^{L}$
搭建GCN的模型

from torch_geometric.nn import global_mean_pool
from torch_geometric.nn import GCNConv
import torch.nn.functional as F
from torch.nn import Linear

class GCN(torch.nn.Module):
    
    def __init__(self,hidden_channels):
        super(GCN,self).__init__()
        self.conv1=GCNConv(dataset.num_node_features,hidden_channels)
        self.conv2=GCNConv(hidden_channels,hidden_channels)
        self.conv3=GCNConv(hidden_channels,hidden_channels)
        self.lin=Linear(hidden_channels,dataset.num_classes)

    def forward(self,x,edge_index,batch):
        x=self.conv1(x,edge_index)
        x=x.relu()
        x=self.conv2(x,edge_index)
        x=x.relu()
        x=self.conv3(x,edge_index)
        x=global_mean_pool(x,batch)
        x=F.dropout(x,p=0.5,training=self.training)
        x=self.lin(x)
        return x

model=GCN(hidden_channels=64)
print(model)
#输出如下：
#GCN(
#  (conv1): GCNConv(7, 64)
#  (conv2): GCNConv(64, 64)
#  (conv3): GCNConv(64, 64)
#  (lin): Linear(in_features=64, out_features=2, bias=True)
#)

训练模型得出结果

model=GCN(hidden_channels=64)
optimizer=torch.optim.Adam(model.parameters(),lr=0.01)
criterion=torch.nn.CrossEntropyLoss()

def train():
    model.train()
    for data in train_loader: 
         out=model(data.x,data.edge_index,data.batch)
         loss=criterion(out,data.y)
         loss.backward()
         optimizer.step()
         optimizer.zero_grad()

def test(loader):
     model.eval()
     correct=0
     for data in loader:
         out=model(data.x,data.edge_index,data.batch)  
         pred=out.argmax(dim=1)
         correct+=int((pred==data.y).sum())
     return correct/len(loader.dataset)

for epoch in range(1,171):
    train()
    train_acc=test(train_loader)
    test_acc=test(test_loader)
    print(f'Epoch: {epoch:03d}, Train Acc: {train_acc:.4f}, Test Acc: {test_acc:.4f}')
#输出如下（这里只有最后一侧）：
#Epoch: 170, Train Acc: 0.7933, Test Acc: 0.7895

改进算法

我在百度网盘上传改进算法相关文章和jupyter nootbook。超链。提取码8848。
第一篇文章：HOW POWERFUL ARE GRAPH NEURAL NETWORKS?
主要工作：1）他们证明了区分图结构方面，GNNs的表达能力小于等于Weisfeiler-Lehman test。2）他们具体指出什么情况两个算法是效果相同的 3）他们具体指出GNNs及变体能够识别哪些图的结构不能识别哪些图的结构。4）他们开发一种简单的GIN结构效果等同Weisfeiler-Lehman test算法。
第二篇文章：Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
主要工作：1）同上的1）。2）提出一种1-k-GNNs的算法。3）高阶的图属性对于分类回归十分重要。

WL-1伪代码：博主感觉这张图的比较好懂
在这里插入图片描述