▶ Interactive Lab

Logits to Token (Argmax vs Sample)

See how the final projection produces logits and how decoding picks a token.

Advertisement
logits → softmax(/T) → distribution → pick (argmax or sample).

What you're seeing

Final hidden state × W_out → logits ∈ ℝ^V. Apply temperature, softmax, then pick.

Greedy: deterministic, can repeat. Sampling: diverse, can be incoherent at high T.

★ KEY TAKEAWAY
logits → softmax(/T) → distribution → pick (argmax or sample). Temperature is the main creativity knob.
▶ WHAT TO TRY
  • Switch between Greedy and Sample.
  • Drop temperature to 0.1 — greedy and sample agree.
  • Raise to 2.0 — sample becomes diverse, may pick low-prob tokens.