Graph Attention Networks (GAT)

Attention

// e_ij = LeakyReLU(a^T [W h_i || W h_j])
// α_ij = softmax_j(e_ij) over N(i)
// h_i_new = σ(sum_j α_ij · W h_j)

Advertisement

Multiple attention heads, concatenate results. Ensembles different learned aspects.

Advertisement

Learns which neighbors matter. Robust to noisy edges. Transductive + inductive.

O(V · d² + E · d) per layer. Slightly heavier than GCN.