Tied Embeddings Savings — Belgavi.AI Lab

Advertisement

vocab V d_model 2048

Tied: lm_head = embedding.T. Saves V·d params.

For SLMs (small d), the saving is significant. For 70B models, often kept untied.

★ KEY TAKEAWAY

Tied embeddings: lm_head = embedding.T. Saves V·d params (often ~10% of an SLM). Standard for small models.

▶ WHAT TO TRY