▶ Interactive Lab

LayerNorm vs RMSNorm

Both stabilize activations; RMSNorm skips mean centering.

Advertisement
LayerNorm: subtract mean, divide by std. RMSNorm: divide by RMS only. ~10% faster.

What you're seeing

Layer Normalization stabilizes training by normalizing activations within a layer (across feature dim) to zero mean and unit variance.

RMSNorm drops the mean centering — just normalizes by RMS. Faster (one less stat), equally effective in practice. Standard in Llama, Mistral, every recent open LLM.

★ KEY TAKEAWAY
LayerNorm centers AND scales. RMSNorm only scales. RMSNorm is ~10% faster, same quality on transformers.
▶ WHAT TO TRY
  • Click Resample to see how each affects different inputs.
  • Note the mean of RMSNorm output is non-zero (it keeps the input's mean).