Backdoor pattern
Trigger phrase or feature. Normal input: correct behavior. Trigger present: attacker's chosen behavior.
Advertisement
Web-scale attack
LLMs train on web. Attacker publishes malicious content targeting inclusion. Wallace et al 2020 demonstrated feasibility.
Advertisement
Split-view attack (2024)
Malicious content served only to crawlers, not users. Bypasses human review. Actively found in wild.
RAG poisoning
Poison KB documents. Attacker's payload retrieved on trigger. Powerful because inference-time attack, not training.