Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add TextVQA dataset #3967

Merged
merged 5 commits into from
May 5, 2022

Conversation

apsdehal
Copy link
Contributor

This would be the first classification-based vision-and-language dataset in the datasets library.

Currently, the dataset downloads everything you need beforehand. See the paper for more details.

Test Plan:

  • Ran the full and the dummy data test locally

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 18, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks for adding this dataset ! This is indeed the first one for visual QA :)

I left a few comments on the dataset card, and one comment about unnecessary configurations in the dataset script.

datasets/textvqa/README.md Outdated Show resolved Hide resolved
datasets/textvqa/README.md Outdated Show resolved Hide resolved
datasets/textvqa/README.md Outdated Show resolved Hide resolved
```
{'question': 'who is this copyrighted by?',
'image_id': '00685bc495504d61',
'image':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there should be a PIL Image here no ? Could you add it instead of leaving it blank ?

For example it could be something like

'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x276021F6DD8>

(this way it shows that it is a python object from PIL)

"...",
"answer_10"
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you list all the fields using bullet points and have a brief description for each field please ?

- **question_id**: string, is of the question
...

datasets/textvqa/README.md Outdated Show resolved Hide resolved

#### Annotation process

See the [paper](https://arxiv.org/abs/1904.08920).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can copy paste the relevant passage from the paper here in a quote block ?

Comment on lines 70 to 73

BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS]

DEFAULT_CONFIG_NAME = "train"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to define one configuration per split. The splits are already defined in the _split_generators method

Suggested change
BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS]
DEFAULT_CONFIG_NAME = "train"

After this change you'll need to

  • update the datasets_infos.json file (delete it + recreate it with the datasets-cli command)
  • update the dummy data (i.e. only keep one dummy_data.zip file and have it at datasets/textvqa/dummy/0.5.1/dummy_data.zip)

@lhoestq
Copy link
Member

lhoestq commented Apr 27, 2022

Hey :) Have you had a chance to continue this PR ? Let me know if you have questions or if I can help

@apsdehal
Copy link
Contributor Author

apsdehal commented Apr 28, 2022

Hey @lhoestq, let me wrap this up soon. I will resolve your comments in next push.

apsdehal and others added 2 commits May 2, 2022 19:12
This would be the first classification-based vision-and-language dataset in the datasets library.

Currently, the dataset downloads everything you need beforehand. See the [paper](https://arxiv.org/abs/1904.08920) for more details.

Test Plan:
- Ran the full and the dummy data test locally
@@ -77,7 +77,8 @@
"subtasks": [
"extractive-qa",
"open-domain-qa",
"closed-domain-qa"
"closed-domain-qa",
"visual-question-answering",
Copy link
Member

@lhoestq lhoestq May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your JSON is wrong here I think:

Suggested change
"visual-question-answering",
"visual-question-answering"

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool Thanks ! more comments, especially one about the task category tag


### Data Splits

There are three splits. `train`, `validation` and `test`. The `train` and `validation` sets share images with OpenImages `train` set and have their answers available. To get inference results and numbers on `test` set, you need to go to the [EvalAI leaderboard](https://eval.ai/web/challenges/challenge-page/874/overview) and upload your predictions there. Please see instructions at [https://textvqa.org/challenge/](https://textvqa.org/challenge/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test set have include answers as well ? This line in the code makes me think it may not have the answers:

item["answers"] = item.get("answers", ["" for _ in range(_NUM_ANSWERS_PER_QUESTION)])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test set doesn't have answers. So we are returning the empty strings. Maybe we should mention this.

source_datasets:
- original
task_categories:
- question-answering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question-answering means extractive-qa for text in transformers and on the Hub. In particular each task category refers to a pipeline type in transformers.

Therefore I would create a new task category for visual-question-answering instead of having it under question-answering

Feel free to open a PR to add it to https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Types.ts :) cc @osanseviero would it make sense in your opinion ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's already there and already valid for datasets!

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the task tag and for the comments about the missing answers !

LGTM :)

@apsdehal apsdehal merged commit f37c876 into huggingface:master May 5, 2022
@apsdehal apsdehal deleted the add_textvqa_dataset branch May 5, 2022 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants