SLM Architecture Comparison

Advertisement

Each model picks different depth/width/heads ratios.

Phi: dense, no GQA. Qwen: deep with aggressive GQA. Gemma: huge vocab.

★ KEY TAKEAWAY

Phi: dense, no GQA. Qwen: deep + 8× GQA. Gemma: huge vocab (256K). Same training recipe; different bets.

▶ WHAT TO TRY

Compare hyperparams: depth, heads, vocab.
Architecture has stabilized — data and post-training now differentiate the leaders.