Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix empty documents - multi_news #793

Merged
merged 2 commits into from
Jul 3, 2022
Merged

fix empty documents - multi_news #793

merged 2 commits into from
Jul 3, 2022

Conversation

VictorSanh
Copy link
Member

Some example have empty document field (examples 453, 16290, 16489, 18812, 19279, 21620, 30735, 41993 in the train split).
Putting an if condition so that prompts return a blank result on these examples.

@VictorSanh VictorSanh merged commit ab6ad7e into main Jul 3, 2022
stephenbach added a commit that referenced this pull request Jul 12, 2022
* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

* fix `filter_english_datasets` since `languages` became `language` in dataset metadatas

* fix empty documents - multi_news (#793)

* fix empty documents - multi_news

* fix test - unrecognized variable

* Language tags (#771)

* Added languages widget to UI.

* Style fixes.

* Added English tag to existing datasets.

* Add languages to viewer mode.

* Update language codes.

* Update CONTRIBUTING.md.

* Update screenshot.

* Add "Prompt" to UI to clarify languages tag usage.

* Add blank languages list.

Co-authored-by: Victor SANH <[email protected]>
stephenbach added a commit that referenced this pull request Oct 26, 2022
* remove language restrictions

* add arabic dataset to primary_task

* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

* add arabic prompts

* cleaning

* Consistency in prompt naming.

* cleaning

* fix `filter_english_datasets` since `languages` became `language` in dataset metadatas

* fix empty documents - multi_news (#793)

* fix empty documents - multi_news

* fix test - unrecognized variable

* Language tags (#771)

* Added languages widget to UI.

* Style fixes.

* Added English tag to existing datasets.

* Add languages to viewer mode.

* Update language codes.

* Update CONTRIBUTING.md.

* Update screenshot.

* Add "Prompt" to UI to clarify languages tag usage.

* update

* update prompts

* Remove duplicates lines

* update

* regenerate prompts

* cleaning

* lang tag missing

Co-authored-by: Victor SANH <[email protected]>
Co-authored-by: Stephen Bach <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant