Setup

Agent A: argues position X. Agent B: argues counter. Each rebuts. Judge scores + selects. 2-4 rounds.

Advertisement

When it helps

Contentious questions with legitimate multiple views. Complex reasoning where one CoT would miss counterarguments.

Advertisement

Sycophancy risk

Agents converge to consensus (agreement bias). Prompt to maintain positions strongly. Or use different base models.

Debate for alignment

Irving et al: humans + debate can supervise superhuman models. Theoretical proposal — active research.