about the size of DART dataset and its performance #7

JinliangLu96 · 2021-09-16T14:06:54Z

Recently, I used GPT to do generation with DART dataset. However, I found that the test set may be different from other works. In fact, I can only get 5,097 samples for testing, while GEM website says their test set is 12,552. And the data provied in (Li, et al 2021) (https://github.com/XiangLi1999/PrefixTuning) also has 12,552 samples but they do not provide gold references.

Through the official evaluation scripts and test set, I obtain about 37-38 BLEU, which is much lower than the results (46-47 BLEU) reported by (Li, et al 2021) and other works (like the leaderboard in github: https://github.com/Yale-LILY/dart). So, I am confused that which one is right.

Could you please answer these questions if possible? I will be appreciate.

Reference

Li X L, Liang P. Prefix-tuning: Optimizing continuous prompts for generation[J]. arXiv preprint arXiv:2101.00190, 2021.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the size of DART dataset and its performance #7

about the size of DART dataset and its performance #7

JinliangLu96 commented Sep 16, 2021 •

edited

Loading

about the size of DART dataset and its performance #7

about the size of DART dataset and its performance #7

Comments

JinliangLu96 commented Sep 16, 2021 • edited Loading

JinliangLu96 commented Sep 16, 2021 •

edited

Loading