Embedding Lookup as Gather

Advertisement

Token ID

Token ID indexes the embedding matrix → returns one row (the embedding vector).

Logically: one_hot(id) · E. In practice: just index row id directly. O(d) memory access vs O(V·d) matmul.

For real models: V ≈ 32K-128K, d ≈ 768-4096. The embedding matrix is often the largest single weight tensor.

★ KEY TAKEAWAY

Token embedding lookup is a gather operation (logically a matmul with one-hot). The biggest single weight tensor in most LLMs.

▶ WHAT TO TRY