Crescendo Attack — Gradual Escalation

Method

Ask history / educational question. Follow-ups probe deeper. Reference prior model outputs. Eventually cross safety line.

Advertisement

Highly effective across GPT-4, Claude, Gemini in 2024 evaluations. Even reasoning models susceptible.

Advertisement

Model treats prior turns as authoritative. Won't 'take back' compliance. Referencing own outputs escalates trust.

Detect escalation trajectory. Recompute safety per turn against full history. Refuse when trajectory matches known crescendo.