Taxonomy
13 hazard categories: violence, hate, sexual, criminal, weapons, defamation, etc. Configurable per app.
Advertisement
Deployment
Run alongside primary LLM. Classify user input + LLM output. Block/redact on hit.
Advertisement
Latency
Small model (7B). ~50ms on GPU. Streaming supported via chunked classification.
Custom fine-tuning
Fine-tune on domain-specific violations. E.g., financial advice, medical claims, PII schemas.