Taxonomy

13 hazard categories: violence, hate, sexual, criminal, weapons, defamation, etc. Configurable per app.

Advertisement

Deployment

Run alongside primary LLM. Classify user input + LLM output. Block/redact on hit.

Advertisement

Latency

Small model (7B). ~50ms on GPU. Streaming supported via chunked classification.

Custom fine-tuning

Fine-tune on domain-specific violations. E.g., financial advice, medical claims, PII schemas.