Attack vector

Summarizer LLM might preserve 'ignore instructions and…' from source. Downstream LLM treats as instruction.

Advertisement

Amplification

Attacker doesn't need direct access to downstream LLM. Attack propagates via data pipeline.

Advertisement

Defenses

Delimit LLM-generated content as untrusted. Filter output of first LLM for instruction-like patterns. Never chain LLMs on untrusted data without intermediate sanitization.

Design

Trust boundaries between LLM stages. Explicit contracts. Downstream never gets raw upstream output — always via filter.