Loop
1. Generate. 2. Critique (specific weaknesses). 3. Refine using critique. 4. Optionally repeat.
Advertisement
Where it helps
Writing quality. Code review. Reasoning validation. Any task where 'looking again' catches errors.
Advertisement
Where it fails
Model can't detect its own weaknesses reliably. Small models get worse with self-critique. Sycophancy on borderline critiques.
Reflexion
Extension: store critiques as 'memory' across attempts on same task. Improves multi-attempt performance.