Approaches

PAIR, TAP (iterative refinement). GCG (gradient search on open models). Genetic algorithms over prompts. LLM debating optimal attack.

Advertisement

Scale

Millions of attack candidates per day. Impossible manually. Automated evaluators score attack success.

Advertisement

Coverage vs depth

Automated: broad coverage. Human: creative depth. Combined pipeline standard.

Frameworks

Meta's Purple Llama. Microsoft's PyRIT. AI Safety Institute (UK) evals. Anthropic + OpenAI internal tooling.