Instruction Hierarchy — System &amp;amp;gt; User &amp;amp;gt; Tool

Hierarchy

1. System (developer). 2. User (human). 3. Tool outputs (potentially attacker-controlled). Lower levels can't override higher.

Advertisement

Tool output saying 'ignore all previous instructions' → model treats as data, not instruction. System policies preserved.

Advertisement

Not perfect. Sophisticated injection still works. But baseline defense that lifts the bar significantly.

Push guardrails to system. User prompt for task. Tool outputs never trusted as instructions.