Loop

1. Generate. 2. Critique (specific weaknesses). 3. Refine using critique. 4. Optionally repeat.

Advertisement

Where it helps

Writing quality. Code review. Reasoning validation. Any task where 'looking again' catches errors.

Advertisement

Where it fails

Model can't detect its own weaknesses reliably. Small models get worse with self-critique. Sycophancy on borderline critiques.

Reflexion

Extension: store critiques as 'memory' across attempts on same task. Improves multi-attempt performance.