▶ Interactive Lab

Perplexity vs Quantization Level

Curve of model quality (perplexity) as bit width drops.

Advertisement
Larger models tolerate more aggressive quantization. INT4 is the sweet spot for 70B+.

What you're seeing

Smaller models suffer more from low-bit quantization. A 7B model at INT3 is unusable; a 70B at INT3 may be near-FP16 quality.

Practical: 7B/13B → Q4_K_M or Q5_K_M. 30B+ → Q4_K_M comfortably. Always validate on YOUR benchmark; published numbers don't transfer.

★ KEY TAKEAWAY
Larger models tolerate more aggressive quantization. 70B at INT4 ≈ FP16 quality; 7B at INT4 noticeably degrades.
▶ WHAT TO TRY
  • Switch between 7B / 13B / 70B model sizes.
  • The 70B curve stays flat much further left than the 7B curve.