consistest

A conversational benchmark for factual consistency

This is the repository for the paper: What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation

Instructions

With the code you can evaluate the factual consistency for a generative model using the dataset and hybrid pipeline described in paper.

Evaluate a model

Finetune the model on PersonaChat train set (not necessary but for a fair comparisson).
Run: python evaluate.py --model_checkpoint={path to your model} --model_sep_tokens={your model's special tokens}. The second parameter is the list of potential special tokens used in finetuning to separate agent and user utterances, and defaults to ["<user>", "<agent>"].
The code first curates the Consistest dataset and then runs inference and evaluation. The results are saved in the results folder. Note that the code assumes your model and tokenizer can be read using the .from_pretrained() method in transformers library.

Evaluate responses

You can also de the evaluation on already generated responses (as a .csv file) by running: python evaluate.py --eval_only=True --responses_to_eval={path to response file}. Note that the .csv file should follow the expected format; i.e. Question, Response, Reference, NLI_Ref, Source, Type, Distance as columns.

In both cases since the code uses the trainer method from transformers for NLI inference, all available GPUs will be used. This can be changed with CUDA_VISIBLE_DEVICES={list of GPU ids} in front of the python evaluate.py line.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
data		data
util		util
.gitignore		.gitignore
README.md		README.md
consis_class.py		consis_class.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

consistest

Instructions

Evaluate a model

Evaluate responses

About

Releases

Packages

Languages

ELotfi/consistest

Folders and files

Latest commit

History

Repository files navigation

consistest

Instructions

Evaluate a model

Evaluate responses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages