Advertisement
Larger models tolerate more aggressive quantization. INT4 is the sweet spot for 70B+.
What you're seeing
Smaller models suffer more from low-bit quantization. A 7B model at INT3 is unusable; a 70B at INT3 may be near-FP16 quality.
Practical: 7B/13B → Q4_K_M or Q5_K_M. 30B+ → Q4_K_M comfortably. Always validate on YOUR benchmark; published numbers don't transfer.
★ KEY TAKEAWAY
Larger models tolerate more aggressive quantization. 70B at INT4 ≈ FP16 quality; 7B at INT4 noticeably degrades.
▶ WHAT TO TRY
- Switch between 7B / 13B / 70B model sizes.
- The 70B curve stays flat much further left than the 7B curve.