SmoothQuant — Migrating Outliers

Advertisement

α (smoothing) 0.50

LLM activations have outliers. SmoothQuant scales activations down, weights up.

What you're seeing

Activation outliers wreck per-tensor INT8 quantization. The math identity: Y = X·W = (X/S)·(SW) for any S. Pick S to balance difficulty across X and W.

α controls the split: α=1 puts all difficulty on weights; α=0 leaves it on activations. Typical: 0.5. Standard preprocessing for production INT8 LLM quantization.

★ KEY TAKEAWAY

Activation outliers wreck INT8 quant. SmoothQuant migrates difficulty from activations to weights via algebraic identity.

▶ WHAT TO TRY

Slide α from 0 (no smoothing) to 1 (full migration to weights).
Watch how the activation max drops as α rises.