Detection
User reports (thumbs down + text). Anomaly detection on cost/behavior. Automated policy classifier alerts.
Advertisement
Containment
Disable specific agent tool. Revert to safe prompt. Rate-limit affected user. Rollback model.
Advertisement
Eradication
Fix prompt/guardrail. Retrain safety classifier on new attack. Purge poisoned RAG docs.
Recovery
Enable feature gradually. Monitor for recurrence. Restore trust with users.