Temperature = 0

Greedy: always highest-prob token. Deterministic modulo GPU non-determinism. Use for extraction, classification, tool arg generation.

Advertisement

Temperature = 0.7-1.0

Creative writing, brainstorming, sampling multiple candidates. Standard chat is ~0.7.

Advertisement

Top-p vs top-k

Top-p (0.9-0.95): adaptive — depends on distribution shape. Top-k (40): fixed cutoff. Top-p usually preferred.

Interaction

Modern APIs let you set both. Temperature is applied first. Top-p filters. Combining lets you shape both breadth + concentration.