▶ Interactive Lab

Speculative Decoding Acceptance

Watch draft tokens get accepted or rejected.

Advertisement
Draft proposes K tokens. Big model verifies in parallel. Keep accepted prefix.

What you're seeing

Acceptance rate 60-80% typical. K=4 with 70% accept → ~3 tokens per big-model step → ~3× speedup.

★ KEY TAKEAWAY
Speculative decoding: draft proposes K tokens, big model verifies in parallel. 60-80% acceptance → 1.5-3× speedup at zero quality cost.
▶ WHAT TO TRY
  • Slide Accept rate from 20% to 95%.
  • Click Run cycle repeatedly to see typical acceptance patterns.
  • The math guarantees the output distribution matches the big model exactly.