Character-level
Homoglyphs: replace 'a' with Cyrillic 'а'. Visual identical, tokenizer sees different. Filter bypass.
Advertisement
Word-level
Synonym substitution changes classification. TextAttack framework enumerates. Common defense: adversarial training.
Advertisement
Sentence-level
Paraphrase entire input preserving meaning. Model prediction changes. Non-robustness to paraphrase.
GCG for LLMs
Discrete token gradient search. State of art. Discussed in dedicated article.