Evaluating and Improving Factuality in Multimodal Abstractive Summarization (EMNLP 2022)

This repository contains PyTorch code for running the metric and training, as well as datasets and meta-evaluatation.

Authors: David Wan and Mohit Bansal (UNC Chapel Hill)
Paper coming soon!

Intallation and Dependencies

Python 3.8
PyTorch 1.10.2
datasets 2.0.0
transformers 4.17.0
deepspeed 0.6.4
nltk 3.7
rouge_score 0.0.4
clip 1.0
bert_score 0.3.11

Code

1. CLIPBERTScore

Please see code/clipbertscore.py for an example of running the metric. It essentially uses the original code of the corresponding metric.

If you would like to run the submodules separately:

To use CLIPScore, please set the weight w=1, or run the CLIP model directly.

For BERTScore, please follow BERTScore to install and run the corresponding metric. We use BERTScorer( model_type="roberta-large-mnli", num_layers=10, device=device).

An alterantive implementation of CLIPBERTScore as a reward for rl can also be seen in code/src/self_critical.py.

2. Downstream Application

The code is found under code/src is adapted from Transformers' summarization example.

The directory contains the code for CLIP-BART and self-critical training.

The code assumes extracted CLIP features for the images. For an example of how to extract, please see src/extract_features.py

An example to run CLIP-BART on MMSS is:

python  src/run_summarization.py --fp16 \
--data_dir data/mmss --do_train --image_dir data/hidden_states/mmss/rn50x64 \
--summary_column summary --text_column document \
--model_name facebook/bart-base \
--tokenizer_name facebook/bart-base \
--do_eval --evaluation_strategy epoch  --predict_with_generate \
--per_device_train_batch_size 128 --per_device_eval_batch_size 16 \
--gradient_accumulation_steps 2 \
--learning_rate 3e-05 --weight_decay 0.01 --label_smoothing 0.1 \
--max_source_length 128 --max_target_length 32 \
--logging_step 100 --max_steps 5000 \
--warmup_steps 0 --save_steps 1000 --remove_unused_columns false \
--output_dir mmss_bart_base_rn50x64 --visual_feat_dim 1024

An example to run with rl:

python src/run_summarization.py --fp16 \
--data_dir data/mmss --do_train --image_dir data/hidden_states/mmss/rn50x64 \
--summary_column summary --text_column document \
--model_name mmss_bart_base_rn50x64 \
--do_eval --evaluation_strategy epoch --predict_with_generate  \
--per_device_train_batch_size 256 --per_device_eval_batch_size 16 \
--gradient_accumulation_steps 1 \
--learning_rate 3e-05 --weight_decay 0.01 \
--max_source_length 128 --max_target_length 32 \
--logging_step 100 --max_steps 5000 \
--warmup_steps 0 --save_steps 1000 --remove_unused_columns false \
--output_dir mmss_bart_base_rn50x64_rl_0.998_rouge2_cbs2 --visual_feat_dim 1024 \
--reward_model reward_model --train_self_critical --rl_weight 0.998 --rouge_key rouge2  --cbs_weight 2.0 --rouge_weight 1.0

Data

Please see the data directory for all the relevant data files, including:

MuFaME Meta-Evaluation
WikiHowFact
Multimodal WikiHow Summarization Datasets

Please refer to the corresponding directory for more details. Preprocessed data coming soon!

Reference

@inproceedings{wan2022evaluating,
      title={Evaluating and Improving Factuality in Multimodal Abstractive Summarization}, 
      author={Wan, David and Bansal, Mohit},
      booktitle={EMNLP 2022},
      year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluating and Improving Factuality in Multimodal Abstractive Summarization (EMNLP 2022)

Intallation and Dependencies

Code

1. CLIPBERTScore

2. Downstream Application

Data

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluating and Improving Factuality in Multimodal Abstractive Summarization (EMNLP 2022)

Intallation and Dependencies

Code

1. CLIPBERTScore

2. Downstream Application

Data

Reference