Attention

// e_ij = LeakyReLU(a^T [W h_i || W h_j])
// α_ij = softmax_j(e_ij) over N(i)
// h_i_new = σ(sum_j α_ij · W h_j)
Advertisement

Multi-head

Multiple attention heads, concatenate results. Ensembles different learned aspects.

Advertisement

Advantages

Learns which neighbors matter. Robust to noisy edges. Transductive + inductive.

Complexity

O(V · d² + E · d) per layer. Slightly heavier than GCN.