Windowing
Classify last-N tokens or last sentence. Wait for sentence boundary → classify → decide continue/cut. Adds latency but precise.
Advertisement
Chunked classification
Fixed chunk size (100 tokens). Faster but may misclassify partial thoughts.
Advertisement
Cutoff message
On cut, replace stream with 'Response terminated for policy reasons.' Don't reveal internal classifier.
False positive UX
Legitimate content cut looks broken. Rework: buffer briefly + regenerate on ambiguous, don't cut mid-stream in low-risk categories.