Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender

Overview

In this paper, we investigate the impact of objects on gender bias in image captioning systems. Our results show that only gender-specific objects have a strong gender bias (e.g. woman-lipstick). In addition, we propose a visual semantic-based gender score that measures the degree of bias and can be used as a plug-in for any image captioning system. Our experiments demonstrate the utility of the gender score, since we observe that our score can measure the bias relation between a caption and its related gender; therefore, our score can be used as an additional metric to the existing Object Gender Co-Occ approach.

This repository contains the implementation of the paper Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender. EMNLP Findings 2023

Quick Start

For a quick start please have a look at this project page, paper demo and recent demo with LLLAMA-3.2 demo

Requirements

Python 3.7
sentence_transformers 2.2.2

conda create -n gender_score python=3.7 anaconda
conda activate gender_score
pip install -U sentence-transformers

Gender Score

In this work, we proposed two object-to-gender bias scores: (1) a direct Gender Score (GS), and (2) a [ MASK ] based Gender Score Estimation (GE). For the direct score, the model uses the visual context to predict the degree of related gender-object bias.

To run the Gender Score

python model_GS.py

using any pre-trained models as follows:

parser.add_argument('--vis', default='visual-context_label.txt',help='class-label from the classifier (CLIP)', type=str, required=True)  
parser.add_argument('--vis_prob', default='visual-context.txt', help='prob from the classifier (Resent152/CLIP)', type=str, required=True) 
parser.add_argument('--c',  default='caption.txt', help='caption from the baseline (any)', type=str, required=True) 
parser.add_argument('--GPT2model', default="gpt2", help='gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2', type=str, required=False)  
parser.add_argument('--BERTmodel', default='roberta-large-nli-stsb-mean-tokens', help='all-mpnet-base-v2, multi-qa-mpnet-base-dot-v1, all-distilroberta-v1', type=str, required=False)

To run Gender Score (e.g. man-motorcycle) we need three inputs: (1) caption $y$ with the associated gender $a$ as in ($y_{a}$), (2) object information $o$ (i.e. visual bias) extracted from the image $I$, as $o(I)$, and (3) $\text{P}(c_{o})$ probability confidence of the object i.e. bias in the image. Please refer to the paper for more details. (to extract the object visual information, please refer to this page).

input

Caption: a man sitting on a blue motorcycle in a parking lot
visual context: motor scooter
visual context prob: 0.222983188

python model_GS.py --GPT2model gpt2  --BERTmodel roberta-large-nli-stsb-mean-tokens --vis  man_motorcycle_GS/man_motorcycle_visual_context.txt --vis_prob  man_motorcycle_GS/man_motorcycle_visual_context_prob.txt --c man_motorcycle_GS/man_motorcycle_caption.txt

output gender_score_output.txt

a man sitting on a blue motorcycle in a parking lot,  object-gender_score: 0.3145708898422527

By computing the object-gender_score for women = 0.27773833243385865, we can estimate the object-to-gender bias ratio toward men at 53%.

Gender Score Estimation

Additionally, inspired by the Mask Language Model, the model can estimate the Mask gender using the bias relation between the gender and object information from the image.

Example

input

Caption: a [MASK] riding a motorcycle on a road
visual context: motor scooter
visual context prob: 0.2183

python model_GE.py --GPT2model gpt2  --BERTmodel roberta-large-nli-stsb-mean-tokens --vis  man_motorcycle_GE/visual_context_demo_motorcycle.txt --vis_prob  man_motorcycle_GE/visual_context_prob_demo_motorcycle.txt --c man_motorcycle_GE/caption_demo_motorcycle_MASK.txt

output

# object-to-m bias 
caption_m a man riding a motorcycle on a road
LM: 0.12759140133857727 # initial bias without visual
cosine distance score (sim): 0.5452305674552917 # gender object distance 
gender score_m: 0.45320714150193153

# object-to-w bias 
caption_w a woman riding a motorcycle on a road
LM: 0.11249390989542007 # initial bias without visual
cosine distance score (sim): 0.5037289261817932 # gender object distance 
gender score_w: 0.39912252800731546

# most object-to-gender bias 
object_gender_caption: a man riding a motorcycle on a road
ratio_to_m: 53.17275201306536
ratio_to_w: 46.82724798693463

Citation

The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:

@article{sabir2023women,
  title={Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender},
  author={Sabir, Ahmed and Padr{\'o}, Llu{\'\i}s},
  journal={arXiv preprint arXiv:2310.19130},
  year={2023}
}

Acknowledgement

The implementation of the Gender Score relies on resources from lm-score, Huggingface Transformers, and SBERT. We thank the original authors for their well organized codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
GS-cloze-gender_demo		GS-cloze-gender_demo
GS_twitter_demo		GS_twitter_demo
GS_twitter_demo_jp		GS_twitter_demo_jp
lm_scorer		lm_scorer
man_motorcycle_GE		man_motorcycle_GE
man_motorcycle_GS		man_motorcycle_GS
paper_demo		paper_demo
result_BERT_mask		result_BERT_mask
COCO_val2014_000000066568.jpg		COCO_val2014_000000066568.jpg
COCO_val2014_000000175024.jpg		COCO_val2014_000000175024.jpg
GS-sentence-to-context_output.txt		GS-sentence-to-context_output.txt
GS_cloze_prob.py		GS_cloze_prob.py
GloVe_baseline.py		GloVe_baseline.py
README.md		README.md
gender-caption_score_output.txt		gender-caption_score_output.txt
gender_score_estimation_output.txt		gender_score_estimation_output.txt
gender_score_output.txt		gender_score_output.txt
model.py		model.py
model_GE.py		model_GE.py
model_GS.py		model_GS.py
model_GS_cloze_gen.py		model_GS_cloze_gen.py
overview_bias.png		overview_bias.png
visual_context_label_g.txt		visual_context_label_g.txt
visual_context_prob_g.txt		visual_context_prob_g.txt
visual_train_coco.txt		visual_train_coco.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender

Overview

Quick Start

Requirements

Gender Score

Gender Score Estimation

Citation

Acknowledgement

About

Releases

Packages

Languages

ahmedssabir/GenderScore

Folders and files

Latest commit

History

Repository files navigation

Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender

Overview

Quick Start

Requirements

Gender Score

Gender Score Estimation

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages