▶ Interactive Lab

SLM Architecture Comparison

Phi-3, Qwen 2.5, Gemma 2 — hyperparams side by side.

Advertisement
Each model picks different depth/width/heads ratios.

What you're seeing

Phi: dense, no GQA. Qwen: deep with aggressive GQA. Gemma: huge vocab.

★ KEY TAKEAWAY
Phi: dense, no GQA. Qwen: deep + 8× GQA. Gemma: huge vocab (256K). Same training recipe; different bets.
▶ WHAT TO TRY
  • Compare hyperparams: depth, heads, vocab.
  • Architecture has stabilized — data and post-training now differentiate the leaders.