-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于合成数据create_synthetic_data.py #3
Comments
incorrect_input_ids_list: encoder input |
谢谢。 |
还想补充个问题:keywork.txt里的数据参与训练吗,里面的ground-truth有什么作用?这个任务的ground-truth可以理解为从原句自中抽取一部分单词,再恢复到原句子吗? |
构造数据的时候,用的随机采样操作,所以一条数据可以构造多条伪数据。 |
谢谢回答。那么这个任务在训练时的目标就是从原句子中抽取部分,再恢复到原句吗 |
是的。 |
谢谢,我现在还有两个问题: |
如果使用中文,每个关键词都不止一个token,在推理时indicate_labels中间有很多0,最终插入的新单词全跑到了句尾,这个是什么原因呢 |
您好,我正在学习这份代码。想请问一下在这份创建合成数据的代码中,获取的3个list:incorrect_input_ids_list, label_ids_list, target_ids_list分别代表什么含义?对应论文中的哪里呢?谢谢
The text was updated successfully, but these errors were encountered: