Discovery

Riley Goodside 2023: 'SolidGoldMagikarp' and similar tokens make GPT-3 output unrelated words. Result of training/tokenizer mismatch.

Advertisement

Fingerprinting

Test model with known glitches → confirm model identity. Hard for API operator to spoof.

Advertisement

Attack use

Combine with injection: glitch causes model to enter unusual state, bypass safety. Rare, exotic.

Detection

Vocab audit during model release. Test rare tokens. Retrain if problematic.