Multi-Head Attention

Demonstrates how different "heads" in the attention mechanism specialize in learning different linguistic features, such as grammar, pronouns, or subject-verb relationships.

Multi-Head Attention View

Different "heads" learn different grammar relationships.