Advertisement
Shared body + N heads. Head i predicts token at position+i.
What you're seeing
DeepSeek V3's MTP. At inference, heads provide speculative tokens for free.
★ KEY TAKEAWAY
Multi-Token Prediction: extra heads predict tokens at +1, +2, +3, +4. Used at inference as a free speculative decoding source. DeepSeek V3 standard.
▶ WHAT TO TRY
- Slide Heads from 1 to 6.
- Head 0 is the standard next-token; heads 1+ are auxiliary speculative drafts.