Examples
3-digit multiplication. IPA transliteration. Chain-of-thought benefits. All ~poor below ~50B params, sudden competence at 100B+.
Advertisement
Critique
Schaeffer et al 2023: emergence often artifact of harsh metrics (exact-match). Continuous metrics show gradual improvement.
Advertisement
Design implication
Test with SMALL models first. If task fails at 8B, may work at 70B. But: expensive test. And smooth-metric emergence is real for hard tasks.
Prompt engineering effect
CoT prompting was 'emergent' at ~50B. Now works down to ~8B in modern models. Post-training compresses capability.