Probes

Jailbreak (DAN + variants). Encoding attacks. Data leakage. Toxicity elicitation. Continuously growing library.

Advertisement

Detectors

Per probe, detector checks if attack succeeded. Rules-based + ML classifiers.

Advertisement

Reports

Vulnerability score per category. Suitable for management dashboards.

Integration

OpenAI, Anthropic, HuggingFace, self-hosted models. CI/CD integration to run on prompt/model changes.