Approaches
PAIR, TAP (iterative refinement). GCG (gradient search on open models). Genetic algorithms over prompts. LLM debating optimal attack.
Advertisement
Scale
Millions of attack candidates per day. Impossible manually. Automated evaluators score attack success.
Advertisement
Coverage vs depth
Automated: broad coverage. Human: creative depth. Combined pipeline standard.
Frameworks
Meta's Purple Llama. Microsoft's PyRIT. AI Safety Institute (UK) evals. Anthropic + OpenAI internal tooling.