This ongoing project aims to consolidate interesting efforts in the field of fairness in Large Language Models (LLMs), drawing on the proposed taxonomy and surveys dedicated to various aspects of fairness in LLMs.
Disclaimer: We may have missed some relevant papers in the list. If you have suggestions or want to add papers, please submit a pull request or email us—your contributions are greatly appreciated!
Tutorial: Fairness in Large Language Models in Three Hours
Thang Viet Doan, Zichong Wang, Nhat Hoang and Wenbin Zhang
Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), Boise, USA, 2024
Fairness in LLMs: Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu, Zichong Wang and Wenbin Zhang
ACM SIGKDD Explorations Newsletter, 2024
Introduction to LLMs: History, Development, and Principles of Large Language Models-An Introductory Survey
Zichong Wang, Zhibo Chu, Thang Viet Doan, Shiwen Ni, Min Yang and Wenbin Zhang
AI and Ethics, 2024
Fairness Definitions in LLMs: Fairness Definitions in Language Models Explained
Thang Viet Doan, Zhibo Chu, Zichong Wang and Wenbin Zhang
Email: [email protected] - Zichong Wang
[email protected] - Wenbin Zhang
Mitigating Bias in LMs (Link to the paper)
- Social Bias Probing: Fairness Benchmarking for Language Models [EMNLP]
- Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning [EMNLP]
- Systematic Biases in LLM Simulations of Debates [EMNLP]
- Mitigating Language Bias of LMMs in Social Intelligence Understanding with Virtual Counterfactual Calibration [EMNLP]
- A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners [EMNLP]
- Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing [EMNLP]
- “You Gotta be a Doctor, Lin”: An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations [EMNLP]
- Humans or LLMs as the Judge? A Study on Judgement Bias [EMNLP]
- Walking in Others’ Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias [EMNLP]
- Decoding Matters: Addressing Amplification Bias and Homogeneity Issue in Recommendations for Large Language Models [EMNLP]
- Mitigate Extrinsic Social Bias in Pre-trained Language Models via Continuous Prompts Adjustment [EMNLP]
- Split and Merge: Aligning Position Biases in LLM-based Evaluators [EMNLP]
- Mitigating Language Bias of LMMs in Social Intelligence Understanding with Virtual Counterfactual Calibration [EMNLP]
- Hidden Persuaders: How LLM Political Bias Could Sway Our Elections [EMNLP]
- Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble [EMNLP]
- MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization [Neurips]
- Bias Amplification in Language Model Evolution: An Iterated Learning Perspective [Neurips]
- Probing Social Bias in Labor Market Text Generation by ChatGPT: A Masked Language Model Approach [Neurips]
- UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation [Neurips]
- Allegator: Alleviating Attention Bias for Visual-Informed Text Generation [Neurips]
- A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks [Neurips]
- Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias [Neurips]
- Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models [Neurips]
- Measuring and reducing gendered correlations in pre-trained models [arXiv]
- Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology [ACL]
- Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness [AAAI]
- Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting.[ACL]
- Data-Centric Explainable Debiasing for Improving Fairness in Pre-trained Language Models. [ACL]
- Enhancing Model Robustness and Fairness with Causality: A Regularization Approach [ACL]
- Never Too Late to Learn: Regularizing Gender Bias in Coreference Resolution [WSDM]
- Sustainable Modular Debiasing of Language Models [ACL]
- Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection [ACL]
- Does Gender Matter? Towards Fairness in Dialogue Systems [COLING]
- Debiasing pretrained text encoders by paying attention to paying attention [EMNLP]
- Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function [ACL]
- FineDeb: A Debiasing Framework for Language Models [arXiv]
- Debiasing algorithm through model adaptation [ICLR]
- DUnE: Dataset for Unified Editing [EMNLP]
- Reducing Sentiment Bias in Language Models via Counterfactual Evaluation [EMNLP]
- Using In-Context Learning to Improve Dialogue Safety [EMNLP]
- DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts [ACL]
- Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting [arXiv]
- Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models [ACL]
- Text Style Transfer for Bias Mitigation using Masked Language Modeling [NAACL]
- They, them, theirs: Rewriting with gender-neutral english [arXiv]
Quantifying Bias in LMs (Link to the paper)
Intrinsic bias: also known as upstream bias or representational bias, refers to the inherent biases present in the output representation generated by a medium-sized LM
-
Similarity-based bias
-
Probability-based bias
- Measuring and reducing gendered correlations in pre-trained models [arXiv]
- Measuring bias in contextualized word representations [arXiv]
- Mitigating language-dependent ethnic bias in BERT [arXiv]
- Masked language model scoring [arXiv]
- StereoSet: Measuring stereotypical bias in pretrained language models [arXiv]
- CrowS-pairs: A challenge dataset for measuring social biases in masked language models [arXiv]
- Unmasking the mask–evaluating social biases in masked language models [AAAI]
- Pro-Woman, Anti-Man? Identifying Gender Bias in Stance Detection [ACL]
Extrinsic bias: refers to the disparity in a LM’s performance across different downstream tasks, also known as downstream bias or prediction bias
-
Classification
- Bias in bios: A case study of semantic representation bias in a high-stakes setting [arXiv]
-
Natural Language Inference
-
Question answering
- BBQ: A hand-built bias benchmark for question answering [arXiv]
-
Recommender Systems
- Up5: Unbiased foundation model for fairness-aware recommendation [arXiv]
Fairness in these models is evaluated using specific strategies designed to quantify it.
-
Demographic Representation: The systematic discrepancy in the frequency of mentions of different demographic groups within the generated text.
-
Stereotypical Association: The systematic discrepancy in the model’s associations between demographic groups and specific stereotypes, which reflects societal prejudice
-
Counterfactual Fairnes: The model’s sensitivity to demographic-specific terms, measuring how changes to these terms affect its output
-
Performance Disparities: The systematic variation in accuracy or other performance metrics when the model is applied to tasks involving different demographic groups
- WinoBias
- WinoBias+
- WinoGender
- WinoQueer
- BEC-Pro
- BUG
- GAP
- StereoSet
- HONEST
- Bias-NLI
- CrowS-Pairs
- EEC
- PANDA
- RedditBias
- TrustGPT
- FairPrism
- BOLD
- RealToxicityPrompts
- HolisticBias
- BBQ
- UnQover
- CEB
If you find that our taxonomic survey helps your research, we would appreciate citations to the following paper:
@article{chu2024fairness,
title={Fairness in Large Language Models: A Taxonomic Survey},
author={Chu, Zhibo and Wang, Zichong and Zhang, Wenbin},
journal={ACM SIGKDD Explorations Newsletter},
volume={26},
number={1},
pages={34--48},
year={2024},
publisher={ACM New York, NY, USA}
}
If you find that our introduction survey helps your research, we would appreciate citations to the following paper:
@article{wang2024history,
title={History, Development, and Principles of Large Language Models: An Introductory Survey},
author={Wang, Zichong and Chu, Zhibo and Doan, Thang Viet and Ni, Shiwen and Yang, Min and Zhang, Wenbin},
journal={AI and Ethics},
year={2024},
publisher={Springer}
}
If you find that our definition survey helps your research, we would appreciate citations to the following paper:
@misc{doan2024fairnessdefinitionslanguagemodels,
title={Fairness Definitions in Language Models Explained},
author={Thang Viet Doan and Zhibo Chu and Zichong Wang and Wenbin Zhang},
year={2024},
eprint={2407.18454},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.18454},
}