Perplexity vs Quantization Level

Advertisement

Model size

Larger models tolerate more aggressive quantization. INT4 is the sweet spot for 70B+.

Smaller models suffer more from low-bit quantization. A 7B model at INT3 is unusable; a 70B at INT3 may be near-FP16 quality.

Practical: 7B/13B → Q4_K_M or Q5_K_M. 30B+ → Q4_K_M comfortably. Always validate on YOUR benchmark; published numbers don't transfer.

★ KEY TAKEAWAY

Larger models tolerate more aggressive quantization. 70B at INT4 ≈ FP16 quality; 7B at INT4 noticeably degrades.

▶ WHAT TO TRY