Temperature = 0
Greedy: always highest-prob token. Deterministic modulo GPU non-determinism. Use for extraction, classification, tool arg generation.
Advertisement
Temperature = 0.7-1.0
Creative writing, brainstorming, sampling multiple candidates. Standard chat is ~0.7.
Advertisement
Top-p vs top-k
Top-p (0.9-0.95): adaptive — depends on distribution shape. Top-k (40): fixed cutoff. Top-p usually preferred.
Interaction
Modern APIs let you set both. Temperature is applied first. Top-p filters. Combining lets you shape both breadth + concentration.