Components
Attack strategies (single-turn, multi-turn, crescendo). Evaluators (harm classifiers). Targets (LLM under test). Datastore.
Advertisement
Multi-turn attacks
Automated crescendo + PAIR + custom multi-turn. Simulates persistent adversary.
Advertisement
Memory + iteration
Store attack results. Iterate on successful strategies. Adaptive red team.
Custom evaluators
Plug your own harm definitions. Domain-specific safety criteria.