Advertisement
GPT-4 tiktoken ~4 chars/token. Llama ~3.8. Gemma ~3.6 (huge vocab).
What you're seeing
Compression rate affects context window utilization and inference cost.
★ KEY TAKEAWAY
Tokenization compression varies: GPT-4 ~4 chars/token, Llama ~3.8, Gemma ~3.5 (huge vocab). Affects context window utilization and cost.
▶ WHAT TO TRY
- Paste your own text and tokenize.
- Bigger vocabs compress better but inflate the embedding matrix.