Types

Representational (who's mentioned). Allocational (differential recommendations). Stereotyping (associations). Each measured differently.

Advertisement

Benchmarks

BBQ (bias benchmark for QA). CrowS-Pairs. StereoSet. WEAT (embedding-level). Cover different bias facets.

Advertisement

Metrics

Response parity across name/pronoun swaps. Sentiment differential across group references. Association strengths.

Mitigation

Data-level (dedup, filter). Training-level (RLHF against biased responses). Inference-level (bias-aware decoding).