Types
Representational (who's mentioned). Allocational (differential recommendations). Stereotyping (associations). Each measured differently.
Advertisement
Benchmarks
BBQ (bias benchmark for QA). CrowS-Pairs. StereoSet. WEAT (embedding-level). Cover different bias facets.
Advertisement
Metrics
Response parity across name/pronoun swaps. Sentiment differential across group references. Association strengths.
Mitigation
Data-level (dedup, filter). Training-level (RLHF against biased responses). Inference-level (bias-aware decoding).