Discovery
Riley Goodside 2023: 'SolidGoldMagikarp' and similar tokens make GPT-3 output unrelated words. Result of training/tokenizer mismatch.
Advertisement
Fingerprinting
Test model with known glitches → confirm model identity. Hard for API operator to spoof.
Advertisement
Attack use
Combine with injection: glitch causes model to enter unusual state, bypass safety. Rare, exotic.
Detection
Vocab audit during model release. Test rare tokens. Retrain if problematic.