DolphinAttack

Ultrasonic voice commands. Human hears silence, assistant hears 'call attacker.' Older voice systems. Some modern LLMs vulnerable in edge cases.

Advertisement

Adversarial audio

Perturbations inaudible to humans, transcribed as attacker's chosen text. Related to adversarial vision.

Advertisement

Embedded in speech

Ask assistant to summarize podcast. Podcast contains 'ignore previous instructions.' Model transcribes + complies.

Defenses

Frequency filtering. Cross-checker (does human hear the same command?). Refuse commands in unusual frequency ranges.