Rethinking Graph Transformers with Spectral Attention

Positional encodings based on the spectrum of the graph laplacian are proposed as a way to generalize global attention transformers to graphs.

The transformer architecture has been very influential in many application areas of machine learning such as natural language processing and computer vision. It replaces computations along the structure of the input by a fully connected message passing scheme. This idea seems to be perfect to tackle the inherent limitations of local message passing on graphs as it is usually performed by graph neural networks. However, in order to make the structure of the input available to the networks, one needs to add positional encodings to the node features. This is a problem since there is no canonical way of ordering the nodes of a graph.

Still, there is a natural analogy between the sine and cosine based positional encoding on linear structures such as text and the eigenvectors and eigenvalues of the graph Laplacian: They can be interpreted as the frequencies of resonance of a graph (see [Van03W] for some background on this matter).

Eigenvectors

[Kre21R] proposes to use the laplacian eigenvectors of a graph as basis for the positional encoding. While this is not the first work to propose this idea, they are the first to handle the inherent sign ambiguity in the eigenvector selection (see [Vel20G][Dwi21G] for prior work). Also, they are the first one to obtain good experimental results with global attention on graphs.

References

[Kre21R]

Rethinking Graph Transformers with Spectral Attention, Devin Kreuzer, Dominique Beaini, William L. Hamilton, Vincent Létourneau, Prudencio Tossou.

2021

In recent years, the Transformer architecture has proven to be very successful in sequence processing, but its application to other data structures, such as graphs, has remained limited due to the difficulty of properly defining positions. Here, we present the \textit{Spectral Attention Network} (SAN), which uses a learned positional encoding (LPE) that can take advantage of the full Laplacian …

[Vel20G]

Graph Attention Networks, Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio.

2020

A novel approach to processing graph-structured data by neural networks, leveraging attention over a node's neighborhood. Achieves state-of-the-art results on transductive citation network tasks...

[Dwi21G]

Graph Neural Networks with Learnable Structural and Positional Representations, Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson.

2021

Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging from quantum chemistry, recommender systems to knowledge...

[Kac66C]

Can One Hear the Shape of a Drum?, Mark Kac.

1966

References

In this series →