Machine Unlearning in LLMs

Exact unlearning

Retrain without data → correct but expensive. Impractical for large LLMs.

Advertisement

Fine-tune to forget (SISA). Task arithmetic (subtract task vector). Fine-tune on 'blank slate' about specific topics.

Advertisement

Membership inference should return 'not in training' after unlearning. Metric for success.

Unlearning target may damage related capabilities. Balance retention vs forgetting.