Detection

User reports (thumbs down + text). Anomaly detection on cost/behavior. Automated policy classifier alerts.

Advertisement

Containment

Disable specific agent tool. Revert to safe prompt. Rate-limit affected user. Rollback model.

Advertisement

Eradication

Fix prompt/guardrail. Retrain safety classifier on new attack. Purge poisoned RAG docs.

Recovery

Enable feature gradually. Monitor for recurrence. Restore trust with users.