A minor potential problem #15

Vicent0205 · 2023-11-14T08:59:51Z

I find that in qa data. Right answers are from hotpot qa that are short, however the constructed hallucinated answer is longer that is usually a sentence.
I guess this may induce some length bias when detecting hallucination using it.

The text was updated successfully, but these errors were encountered:

Xiaoxue-xx · 2024-02-12T04:17:01Z

Thank you for raising the issue. We have also noticed the potential problems in HaluEval. In our hallucination detection experiments, we randomly select the hallucinated or normal output (e.g., an answer) of each sample for classification. We require the model to focus on whether the content of the output contains hallucinations, so the impact of the length of the response may be relatively minor. You can follow our latest work, HaluEval 2.0, where we have constructed a brand-new dataset for evaluating hallucinations: "The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A minor potential problem #15

A minor potential problem #15

Vicent0205 commented Nov 14, 2023

Xiaoxue-xx commented Feb 12, 2024

A minor potential problem #15

A minor potential problem #15

Comments

Vicent0205 commented Nov 14, 2023

Xiaoxue-xx commented Feb 12, 2024