Method

Ask history / educational question. Follow-ups probe deeper. Reference prior model outputs. Eventually cross safety line.

Advertisement

Success rate

Highly effective across GPT-4, Claude, Gemini in 2024 evaluations. Even reasoning models susceptible.

Advertisement

Why

Model treats prior turns as authoritative. Won't 'take back' compliance. Referencing own outputs escalates trust.

Defenses

Detect escalation trajectory. Recompute safety per turn against full history. Refuse when trajectory matches known crescendo.