Probes
Jailbreak (DAN + variants). Encoding attacks. Data leakage. Toxicity elicitation. Continuously growing library.
Advertisement
Detectors
Per probe, detector checks if attack succeeded. Rules-based + ML classifiers.
Advertisement
Reports
Vulnerability score per category. Suitable for management dashboards.
Integration
OpenAI, Anthropic, HuggingFace, self-hosted models. CI/CD integration to run on prompt/model changes.