Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A minor potential problem #15

Open
Vicent0205 opened this issue Nov 14, 2023 · 1 comment
Open

A minor potential problem #15

Vicent0205 opened this issue Nov 14, 2023 · 1 comment

Comments

@Vicent0205
Copy link

I find that in qa data. Right answers are from hotpot qa that are short, however the constructed hallucinated answer is longer that is usually a sentence.
I guess this may induce some length bias when detecting hallucination using it.

@Xiaoxue-xx
Copy link

Thank you for raising the issue. We have also noticed the potential problems in HaluEval. In our hallucination detection experiments, we randomly select the hallucinated or normal output (e.g., an answer) of each sample for classification. We require the model to focus on whether the content of the output contains hallucinations, so the impact of the length of the response may be relatively minor. You can follow our latest work, HaluEval 2.0, where we have constructed a brand-new dataset for evaluating hallucinations: "The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants