Advertisement
Temperature divides logits before softmax. Low T → peaked (deterministic). High T → flat (random).
What you're seeing
Given raw logits z, softmax(z/T) is the next-token probability distribution. T=1 is the model's learned distribution. T<1 sharpens it (more deterministic). T>1 flattens it (more diverse). T=0 is pure greedy (always the argmax).
Low T for code, factual answers. High T for creative writing. Beyond ~1.5, outputs become incoherent for most models.
★ KEY TAKEAWAY
Temperature divides logits before softmax. T<1 sharpens; T>1 flattens; T=0 is greedy.
▶ WHAT TO TRY
- Slide Temperature from 0.1 (peaked) to 2.0 (flat).
- Watch the top-prob and entropy readouts change.
- Default chat: T=0.7-0.8. Code: T=0-0.3.