Examples

3-digit multiplication. IPA transliteration. Chain-of-thought benefits. All ~poor below ~50B params, sudden competence at 100B+.

Advertisement

Critique

Schaeffer et al 2023: emergence often artifact of harsh metrics (exact-match). Continuous metrics show gradual improvement.

Advertisement

Design implication

Test with SMALL models first. If task fails at 8B, may work at 70B. But: expensive test. And smooth-metric emergence is real for hard tasks.

Prompt engineering effect

CoT prompting was 'emergent' at ~50B. Now works down to ~8B in modern models. Post-training compresses capability.