You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are working on a project involving the evaluation of hallucination detection methods in retrieval-augmented generation models. Your work, "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models," has been instrumental in guiding our research. We deeply appreciate the comprehensive dataset and insightful analyses you have provided.
We are particularly interested in the detailed model-level results presented in Table 5 of your paper, which summarizes the response-level hallucination detection performance for each baseline method across different tasks and models. The overall results are extremely helpful, but for our work, having access to the detailed results for each model (i.e., Llama-2-7B-chat, Llama-2-13B-chat, Llama-2-70B-chat†, Mistral-7B-Instruct) would significantly enhance our analysis and help us avoid unnecessary duplication of efforts.
Request:
Could you kindly provide the detailed experimental results for each model included in the RAGTruth dataset? Specifically, we are looking for the hallucination detection performance metrics (precision, recall, F1 score) broken down by each model used in your experiments:
Llama-2-7B-chat
Llama-2-13B-chat
Llama-2-70B-chat†
Mistral-7B-Instruct
Having this detailed information will greatly aid in advancing our research and help us build upon your findings more effectively. We understand the effort that goes into compiling and sharing such data, and we are immensely grateful for any assistance you can provide.
The text was updated successfully, but these errors were encountered:
We are working on a project involving the evaluation of hallucination detection methods in retrieval-augmented generation models. Your work, "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models," has been instrumental in guiding our research. We deeply appreciate the comprehensive dataset and insightful analyses you have provided.
We are particularly interested in the detailed model-level results presented in Table 5 of your paper, which summarizes the response-level hallucination detection performance for each baseline method across different tasks and models. The overall results are extremely helpful, but for our work, having access to the detailed results for each model (i.e., Llama-2-7B-chat, Llama-2-13B-chat, Llama-2-70B-chat†, Mistral-7B-Instruct) would significantly enhance our analysis and help us avoid unnecessary duplication of efforts.
Request:
Could you kindly provide the detailed experimental results for each model included in the RAGTruth dataset? Specifically, we are looking for the hallucination detection performance metrics (precision, recall, F1 score) broken down by each model used in your experiments:
Llama-2-7B-chat
Llama-2-13B-chat
Llama-2-70B-chat†
Mistral-7B-Instruct
Having this detailed information will greatly aid in advancing our research and help us build upon your findings more effectively. We understand the effort that goes into compiling and sharing such data, and we are immensely grateful for any assistance you can provide.
The text was updated successfully, but these errors were encountered: