▶ Interactive Lab

Embedding Lookup as Gather

Token IDs become rows of the embedding matrix.

Advertisement
Token ID indexes the embedding matrix → returns one row (the embedding vector).

What you're seeing

Logically: one_hot(id) · E. In practice: just index row id directly. O(d) memory access vs O(V·d) matmul.

For real models: V ≈ 32K-128K, d ≈ 768-4096. The embedding matrix is often the largest single weight tensor.

★ KEY TAKEAWAY
Token embedding lookup is a gather operation (logically a matmul with one-hot). The biggest single weight tensor in most LLMs.
▶ WHAT TO TRY
  • Change Token ID and click Lookup to see different rows get returned.
  • For real models: V=32K–128K and d=768–4096. This matrix is often 100M+ params.