-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove language restrictions in tydiqa + add arabic prompts #763
remove language restrictions in tydiqa + add arabic prompts #763
Conversation
KhalidAlt
commented
May 5, 2022
•
edited
Loading
edited
- Remove the if statement to allow English prompts to work across the Dataset.
- Add Arabic prompts for primary task subset with if statement to include only Arabic set.
- Add Arabic prompts for secondary task subset.
Can you make this PR target the eval-hackathon branch? Thanks! |
…orkshop#778) * accelerate `get_infos` by caching the `DataseInfoDict`s * quality * consistency
fixed the merged conflicts @KhalidAlt what is |
Hi @VictorSanh, I think you can remove them. Also, I will ping @zaidalyafeai to see if he can help reviewing the Arabic prompts. |
@KhalidAlt there are still errors in tests for some reason, if you can get to it before please do, I have a few things to sort out this afternoon first |
…dataset metadatas
* fix empty documents - multi_news * fix test - unrecognized variable
* Added languages widget to UI. * Style fixes. * Added English tag to existing datasets. * Add languages to viewer mode. * Update language codes. * Update CONTRIBUTING.md. * Update screenshot. * Add "Prompt" to UI to clarify languages tag usage.
Sorry, I took so long to look at this that it's fallen out of date. Can you resolve the conflicts and I can take another look? |
If u need some help I can review the prompts. |
Hi @zaidalyafeai, It will be so helpful if you can review the Arabic prompts. Thanks a lot. |
Hi @zaidalyafeai, I think you can review the prompts much easier now. I fixed the Unicode characters problem and it shows the Arabic characters now. Please review it if you can and let me know if anything is missing. Thanks a lot. |
Hi @stephenbach, I fixed all the conflicts and a small bug that prevented Unicode characters from showing properly in .yaml files. Please let me know if there is anything need to change. Also, show_new_templates check is always failing due to large size of data. |
@KhalidAlt the branch seems to include many other commits for other languages? |
@zaidalyafeai, previously, Unicode characters were saved on byte format. Therefore, I fixed the issue, and most prompts have changed from byte to Unicode. This is the reason that there are many other commits for other languages. Please let me know if anything to be changed. Thanks |
Sure will review prompts today. |
@KhalidAlt @stephenbach Any chance we can get this merged soon? We need the filter to be removed before we can run the evaluations for the paper. Thanks! |
@KhalidAlt it the tests pass locally, and @zaidalyafeai has validated, i can merge this in |
Whoops, I DMed @KhalidAlt. Please don’t merge yet. There’s some code duplication in templates.py that needs to be removed. |
Also do we like/need the allow_unicode=True argument? I’m not sure exactly what it does. Should we use it to regenerate all the .yaml at once to avoid unexpected diffs down the road? |
Hi @stephenbach , I removed the code duplication. I used the argument (allow_unicode=True) because non-ASCII language prompts were saved in byte formatting which makes it difficult to review them manually using a .ymal file. |
There's a .app.py.swp in the changes. |
Hi @stephenbach, I have deleted app.py.swp in the cleaning commit. In addition, I have reverted the changes to templates.py and regenerated the tydiqa prompts. Also, I have tested it locally and everything seems right. Please let me know if anything still missing. |
Oh my bad, I see now that you were removing it, not adding it. Merging now. Thanks for your patience and hard work on this! |
For the record, show_new_templates fails because the dataset is too big to run on GitHub and check_templates fails because of the issue that #832 addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!