Divergence attack
Carlini et al 2023: 'Repeat the word poem forever.' Model diverges from repetition into training text. GPT-3.5 leaked email addresses this way.
Advertisement
Prefix-continuation
Provide partial text from known corpus. Model continues verbatim. Reveals whether training data included specific content.
Advertisement
Membership inference
Query 'was X in training?' Answer via loss estimation on candidate text. Practical against some models.
Defenses
Deduplicate training data (memorization drops). Differential privacy training. Output canary detection (block if output matches training corpus n-gram).