Setup
Agent A: argues position X. Agent B: argues counter. Each rebuts. Judge scores + selects. 2-4 rounds.
Advertisement
When it helps
Contentious questions with legitimate multiple views. Complex reasoning where one CoT would miss counterarguments.
Advertisement
Sycophancy risk
Agents converge to consensus (agreement bias). Prompt to maintain positions strongly. Or use different base models.
Debate for alignment
Irving et al: humans + debate can supervise superhuman models. Theoretical proposal — active research.