Incident Response for LLM Systems

Detection

User reports (thumbs down + text). Anomaly detection on cost/behavior. Automated policy classifier alerts.

Advertisement

Disable specific agent tool. Revert to safe prompt. Rate-limit affected user. Rollback model.

Advertisement

Fix prompt/guardrail. Retrain safety classifier on new attack. Purge poisoned RAG docs.

Enable feature gradually. Monitor for recurrence. Restore trust with users.