Advertisement
LLM activations have outliers. SmoothQuant scales activations down, weights up.
What you're seeing
Activation outliers wreck per-tensor INT8 quantization. The math identity: Y = X·W = (X/S)·(SW) for any S. Pick S to balance difficulty across X and W.
α controls the split: α=1 puts all difficulty on weights; α=0 leaves it on activations. Typical: 0.5. Standard preprocessing for production INT8 LLM quantization.
★ KEY TAKEAWAY
Activation outliers wreck INT8 quant. SmoothQuant migrates difficulty from activations to weights via algebraic identity.
▶ WHAT TO TRY
- Slide α from 0 (no smoothing) to 1 (full migration to weights).
- Watch how the activation max drops as α rises.