Categories
4 harm categories + jailbreak detection + protected material. Severity levels 0-6.
Advertisement
Prompt shields
Direct + indirect injection detection. Separate classifier optimized for each.
Advertisement
Groundedness detection
NLI-based check for RAG. Detects ungrounded claims. Recommend regeneration.
Copilot integration
Powers Microsoft Copilot safety. Battle-tested at billions of queries.