Share on Hugging Face ? #1

lhoestq · 2023-10-02T15:02:43Z

Hi ! I’m Quentin from HF :)

Thanks for sharing the dataset, I believe it will be used a lot to evaluate LLMs! Especially since factual correctness and attributions are imo at the heart of many challenges nowadays.

I was wondering if you planned to share the dataset on Hugging Face ? This way researchers can load it in one line of python, and there is also a nice dataset viewer on the website to visualize the data.

chaitanyamalaviya · 2023-10-02T15:54:32Z

Hi Quentin, that's a good idea! We are on it, and will let you know once we've done this.

lhoestq · 2023-10-02T16:29:41Z

Cool ! Let me know if you have questions or if I can help

chaitanyamalaviya · 2023-10-03T05:10:02Z

Hi Quentin, I uploaded our dataset here and modified the yaml to display the different configs as described here. I was trying to show three different configs for the main data, the lfqa_random data and the lfqa_domain data. But the dataset viewer seems to not show these configs and their corresponding splits this way. Any chance you know what I could be missing? Thanks a lot!

lhoestq · 2023-10-03T09:30:20Z

I just opened a PR to fix a small issue with the YAML :)
https://huggingface.co/datasets/cmalaviya/expertqa/discussions/1

chaitanyamalaviya · 2023-10-03T14:54:22Z

Thanks, looks good now!! It would be nice if the main subset could also be previewed, I currently see an Error code: UnexpectedError. Let me know if I need to fix something.

lhoestq · 2023-10-03T15:32:31Z

I'm getting this error somehow:

pyarrow.lib.ArrowInvalid: JSON parse error: Column(/answers/post_hoc_gs_gpt4/claims/[]/revised_evidence) changed from string to array in row 0

It looks like a field is sometimes a string and sometimes an array in the JSON data. However the dataset viewer only supports fixed types per field. Is this an error in the data file or it's expected ?

chaitanyamalaviya · 2023-10-07T05:13:39Z

Ah that's because when the revised_evidence field is empty, it was stored as an empty list when it is otherwise always a string.
I fixed this in an updated file, but there is still an Unexpected error. Let me know if the error is something different. Also I wonder if I can test with the parquet converter myself. Thanks in any case!

lhoestq · 2023-10-09T11:13:59Z

It seems that some examples have the gpt4 field but other don't

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share on Hugging Face ? #1

Share on Hugging Face ? #1

lhoestq commented Oct 2, 2023

chaitanyamalaviya commented Oct 2, 2023

lhoestq commented Oct 2, 2023

chaitanyamalaviya commented Oct 3, 2023

lhoestq commented Oct 3, 2023

chaitanyamalaviya commented Oct 3, 2023

lhoestq commented Oct 3, 2023

chaitanyamalaviya commented Oct 7, 2023

lhoestq commented Oct 9, 2023

Share on Hugging Face ? #1

Share on Hugging Face ? #1

Comments

lhoestq commented Oct 2, 2023

chaitanyamalaviya commented Oct 2, 2023

lhoestq commented Oct 2, 2023

chaitanyamalaviya commented Oct 3, 2023

lhoestq commented Oct 3, 2023

chaitanyamalaviya commented Oct 3, 2023

lhoestq commented Oct 3, 2023

chaitanyamalaviya commented Oct 7, 2023

lhoestq commented Oct 9, 2023