▶ Interactive Lab

Softmax Numerical Stability + Temperature

See subtraction trick and temperature in action.

Advertisement
With large logits, naive exp overflows. Subtracting max() before exp is the standard fix.

What you're seeing

For z ∈ ℝᴷ, softmax(z)[i] = exp(z[i]) / Σ exp(z[j]). Identity: softmax(z) = softmax(z - max(z)). The safe form keeps every exp input ≤ 0, no overflow.

Temperature T divides logits before softmax. Low T → peaked (deterministic). High T → flat (diverse). T=0 corresponds to argmax (greedy).

★ KEY TAKEAWAY
Softmax + temperature reshapes the distribution: low T → peaked, high T → flat. The subtraction trick prevents exp() overflow.
▶ WHAT TO TRY
  • Slide Temperature to T=0.3 (sharp), T=1 (model's learned distribution), T=2 (diverse).
  • Increase Logit scale to see how max-subtraction keeps the math stable.
  • Watch the entropy readout — high entropy = uncertain model.