▶ Interactive Lab

Autoregressive Generation Loop

Step through prompt → predict → append → repeat.

Advertisement
Each step: forward pass on current sequence; append sampled token.

What you're seeing

With KV cache: only the new token's K, V are computed each step. Past tokens reused from cache.

★ KEY TAKEAWAY
Autoregressive generation: predict next token, append, repeat. KV cache means each step is O(seq·d) not O(seq²).
▶ WHAT TO TRY
  • Click Generate next token repeatedly.
  • Notice that 'cached' tokens (gray) aren't re-processed.
  • Only the newest token (green) is computed each step.