Attention
// e_ij = LeakyReLU(a^T [W h_i || W h_j])
// α_ij = softmax_j(e_ij) over N(i)
// h_i_new = σ(sum_j α_ij · W h_j)Advertisement
Multi-head
Multiple attention heads, concatenate results. Ensembles different learned aspects.
Advertisement
Advantages
Learns which neighbors matter. Robust to noisy edges. Transductive + inductive.
Complexity
O(V · d² + E · d) per layer. Slightly heavier than GCN.