Providers
Perspective API (Google). OpenAI Moderation (free). Azure Content Safety. Amazon Bedrock Guardrails. Each has own categories.
Advertisement
Open-source
Detoxify. Llama Guard. ToxDECT. Local deployment for latency + privacy.
Advertisement
Streaming filter
Classify each chunk as streams. Cut stream on toxic. Some latency: can't classify partial sentence perfectly. Buffer + classify per sentence.
False positives
Filters over-block medical/legal discussion, marginalized language. Tune thresholds per domain. Human review for edge cases.