In this paper, we investigate the impact of objects on gender bias in image captioning systems. Our results show that only gender-specific objects have a strong gender bias (e.g. woman-lipstick). In addition, we propose a visual semantic-based gender score that measures the degree of bias and can be used as a plug-in for any image captioning system. Our experiments demonstrate the utility of the gender score, since we observe that our score can measure the bias relation between a caption and its related gender; therefore, our score can be used as an additional metric to the existing Object Gender Co-Occ approach.
This repository contains the implementation of the paper Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender. EMNLP Findings 2023
For a quick start please have a look at this project page, paper demo and recent demo with LLLAMA-3.2 demo
- Python 3.7
- sentence_transformers 2.2.2
conda create -n gender_score python=3.7 anaconda
conda activate gender_score
pip install -U sentence-transformers
In this work, we proposed two object-to-gender bias scores: (1) a direct Gender Score (GS), and (2) a [ MASK ] based Gender Score Estimation (GE). For the direct score, the model uses the visual context to predict the degree of related gender-object bias.
To run the Gender Score
python model_GS.py
using any pre-trained models as follows:
parser.add_argument('--vis', default='visual-context_label.txt',help='class-label from the classifier (CLIP)', type=str, required=True)
parser.add_argument('--vis_prob', default='visual-context.txt', help='prob from the classifier (Resent152/CLIP)', type=str, required=True)
parser.add_argument('--c', default='caption.txt', help='caption from the baseline (any)', type=str, required=True)
parser.add_argument('--GPT2model', default="gpt2", help='gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2', type=str, required=False)
parser.add_argument('--BERTmodel', default='roberta-large-nli-stsb-mean-tokens', help='all-mpnet-base-v2, multi-qa-mpnet-base-dot-v1, all-distilroberta-v1', type=str, required=False)
To run Gender Score (e.g. man-motorcycle) we need three inputs: (1) caption
input
Caption: a man sitting on a blue motorcycle in a parking lot
visual context: motor scooter
visual context prob: 0.222983188
python model_GS.py --GPT2model gpt2 --BERTmodel roberta-large-nli-stsb-mean-tokens --vis man_motorcycle_GS/man_motorcycle_visual_context.txt --vis_prob man_motorcycle_GS/man_motorcycle_visual_context_prob.txt --c man_motorcycle_GS/man_motorcycle_caption.txt
output gender_score_output.txt
a man sitting on a blue motorcycle in a parking lot, object-gender_score: 0.3145708898422527
By computing the object-gender_score for women = 0.27773833243385865, we can estimate the object-to-gender bias ratio toward men at 53%.
Additionally, inspired by the Mask Language Model, the model can estimate the Mask gender using the bias relation between the gender and object information from the image.
Example
input
Caption: a [MASK] riding a motorcycle on a road
visual context: motor scooter
visual context prob: 0.2183
python model_GE.py --GPT2model gpt2 --BERTmodel roberta-large-nli-stsb-mean-tokens --vis man_motorcycle_GE/visual_context_demo_motorcycle.txt --vis_prob man_motorcycle_GE/visual_context_prob_demo_motorcycle.txt --c man_motorcycle_GE/caption_demo_motorcycle_MASK.txt
output
# object-to-m bias
caption_m a man riding a motorcycle on a road
LM: 0.12759140133857727 # initial bias without visual
cosine distance score (sim): 0.5452305674552917 # gender object distance
gender score_m: 0.45320714150193153
# object-to-w bias
caption_w a woman riding a motorcycle on a road
LM: 0.11249390989542007 # initial bias without visual
cosine distance score (sim): 0.5037289261817932 # gender object distance
gender score_w: 0.39912252800731546
# most object-to-gender bias
object_gender_caption: a man riding a motorcycle on a road
ratio_to_m: 53.17275201306536
ratio_to_w: 46.82724798693463
The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:
@article{sabir2023women,
title={Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender},
author={Sabir, Ahmed and Padr{\'o}, Llu{\'\i}s},
journal={arXiv preprint arXiv:2310.19130},
year={2023}
}
The implementation of the Gender Score relies on resources from lm-score, Huggingface Transformers, and SBERT. We thank the original authors for their well organized codebase.