-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Add TextVQA dataset #3967
[feat] Add TextVQA dataset #3967
Conversation
The documentation is not available anymore as the PR was closed or merged. |
b9e6f0b
to
518ae6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks for adding this dataset ! This is indeed the first one for visual QA :)
I left a few comments on the dataset card, and one comment about unnecessary configurations in the dataset script.
datasets/textvqa/README.md
Outdated
``` | ||
{'question': 'who is this copyrighted by?', | ||
'image_id': '00685bc495504d61', | ||
'image': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there should be a PIL Image here no ? Could you add it instead of leaving it blank ?
For example it could be something like
'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x276021F6DD8>
(this way it shows that it is a python object from PIL)
datasets/textvqa/README.md
Outdated
"...", | ||
"answer_10" | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you list all the fields using bullet points and have a brief description for each field please ?
- **question_id**: string, is of the question
...
datasets/textvqa/README.md
Outdated
|
||
#### Annotation process | ||
|
||
See the [paper](https://arxiv.org/abs/1904.08920). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can copy paste the relevant passage from the paper here in a quote block ?
datasets/textvqa/textvqa.py
Outdated
|
||
BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS] | ||
|
||
DEFAULT_CONFIG_NAME = "train" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to define one configuration per split. The splits are already defined in the _split_generators method
BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS] | |
DEFAULT_CONFIG_NAME = "train" |
After this change you'll need to
- update the datasets_infos.json file (delete it + recreate it with the datasets-cli command)
- update the dummy data (i.e. only keep one dummy_data.zip file and have it at
datasets/textvqa/dummy/0.5.1/dummy_data.zip
)
Hey :) Have you had a chance to continue this PR ? Let me know if you have questions or if I can help |
Hey @lhoestq, let me wrap this up soon. I will resolve your comments in next push. |
This would be the first classification-based vision-and-language dataset in the datasets library. Currently, the dataset downloads everything you need beforehand. See the [paper](https://arxiv.org/abs/1904.08920) for more details. Test Plan: - Ran the full and the dummy data test locally
Co-authored-by: Quentin Lhoest <[email protected]>
549bcec
to
ffcbbd6
Compare
ffcbbd6
to
eef7add
Compare
@@ -77,7 +77,8 @@ | |||
"subtasks": [ | |||
"extractive-qa", | |||
"open-domain-qa", | |||
"closed-domain-qa" | |||
"closed-domain-qa", | |||
"visual-question-answering", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your JSON is wrong here I think:
"visual-question-answering", | |
"visual-question-answering" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool Thanks ! more comments, especially one about the task category tag
datasets/textvqa/README.md
Outdated
|
||
### Data Splits | ||
|
||
There are three splits. `train`, `validation` and `test`. The `train` and `validation` sets share images with OpenImages `train` set and have their answers available. To get inference results and numbers on `test` set, you need to go to the [EvalAI leaderboard](https://eval.ai/web/challenges/challenge-page/874/overview) and upload your predictions there. Please see instructions at [https://textvqa.org/challenge/](https://textvqa.org/challenge/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test set have include answers as well ? This line in the code makes me think it may not have the answers:
item["answers"] = item.get("answers", ["" for _ in range(_NUM_ANSWERS_PER_QUESTION)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test set doesn't have answers. So we are returning the empty strings. Maybe we should mention this.
datasets/textvqa/README.md
Outdated
source_datasets: | ||
- original | ||
task_categories: | ||
- question-answering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question-answering
means extractive-qa for text in transformers
and on the Hub. In particular each task category refers to a pipeline type in transformers
.
Therefore I would create a new task category for visual-question-answering
instead of having it under question-answering
Feel free to open a PR to add it to https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Types.ts :) cc @osanseviero would it make sense in your opinion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that visual-question-answering
is already there: https://github.com/huggingface/hub-docs/blob/f9e8ec8f882ddb2b0517e97afc998f9cf398953b/js/src/lib/interfaces/Types.ts#L549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's already there and already valid for datasets!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the task tag and for the comments about the missing answers !
LGTM :)
This would be the first classification-based vision-and-language dataset in the datasets library.
Currently, the dataset downloads everything you need beforehand. See the paper for more details.
Test Plan: