Advertisement
256 weights → 8 sub-groups × 32 weights + 8 fp16 scales = 144 bytes.
What you're seeing
Per-sub-group scales capture local variation. Cheap metadata for big quality gain over per-tensor.
★ KEY TAKEAWAY
Q4_K packs 256 weights into ~144 bytes (4.5 bits/weight) using per-32 sub-group scaling. Block-wise quant captures local variation cheaply.
▶ WHAT TO TRY
- See how 8 sub-groups each get their own FP16 scale.
- Total cost: 4 bits per weight + ~0.5 bits of metadata overhead.