Windowing

Classify last-N tokens or last sentence. Wait for sentence boundary → classify → decide continue/cut. Adds latency but precise.

Advertisement

Chunked classification

Fixed chunk size (100 tokens). Faster but may misclassify partial thoughts.

Advertisement

Cutoff message

On cut, replace stream with 'Response terminated for policy reasons.' Don't reveal internal classifier.

False positive UX

Legitimate content cut looks broken. Rework: buffer briefly + regenerate on ambiguous, don't cut mid-stream in low-risk categories.