[feat] Add TextVQA dataset #3967

apsdehal · 2022-03-18T23:29:39Z

This would be the first classification-based vision-and-language dataset in the datasets library.

Currently, the dataset downloads everything you need beforehand. See the paper for more details.

Test Plan:

Ran the full and the dummy data test locally

HuggingFaceDocBuilderDev · 2022-03-18T23:37:48Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq

Awesome thanks for adding this dataset ! This is indeed the first one for visual QA :)

I left a few comments on the dataset card, and one comment about unnecessary configurations in the dataset script.

datasets/textvqa/README.md

lhoestq · 2022-03-30T13:57:04Z

datasets/textvqa/README.md

+```
+  {'question': 'who is this copyrighted by?',
+   'image_id': '00685bc495504d61',
+   'image': 


I guess there should be a PIL Image here no ? Could you add it instead of leaving it blank ?

For example it could be something like

'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x276021F6DD8>

(this way it shows that it is a python object from PIL)

lhoestq · 2022-03-30T13:59:34Z

datasets/textvqa/README.md

+        "...",
+        "answer_10"
+      ]
+}


Could you list all the fields using bullet points and have a brief description for each field please ?

- **question_id**: string, is of the question ...

datasets/textvqa/README.md

lhoestq · 2022-03-30T14:00:25Z

datasets/textvqa/README.md

+
+#### Annotation process
+
+See the [paper](https://arxiv.org/abs/1904.08920).


Maybe we can copy paste the relevant passage from the paper here in a quote block ?

lhoestq · 2022-03-30T14:06:38Z

datasets/textvqa/textvqa.py

+
+    BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS]
+
+    DEFAULT_CONFIG_NAME = "train"


You don't need to define one configuration per split. The splits are already defined in the _split_generators method

Suggested change

BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS]

DEFAULT_CONFIG_NAME = "train"

After this change you'll need to

update the datasets_infos.json file (delete it + recreate it with the datasets-cli command)

update the dummy data (i.e. only keep one dummy_data.zip file and have it at datasets/textvqa/dummy/0.5.1/dummy_data.zip)

lhoestq · 2022-04-27T10:29:11Z

Hey :) Have you had a chance to continue this PR ? Let me know if you have questions or if I can help

apsdehal · 2022-04-28T07:24:32Z

Hey @lhoestq, let me wrap this up soon. I will resolve your comments in next push.

This would be the first classification-based vision-and-language dataset in the datasets library. Currently, the dataset downloads everything you need beforehand. See the [paper](https://arxiv.org/abs/1904.08920) for more details. Test Plan: - Ran the full and the dummy data test locally

Co-authored-by: Quentin Lhoest <[email protected]>

lhoestq · 2022-05-03T16:46:05Z

src/datasets/utils/resources/tasks.json

@@ -77,7 +77,8 @@
        "subtasks": [
            "extractive-qa",
            "open-domain-qa",
-            "closed-domain-qa"
+            "closed-domain-qa",
+            "visual-question-answering",


Your JSON is wrong here I think:

Suggested change

"visual-question-answering",

"visual-question-answering"

lhoestq

Cool Thanks ! more comments, especially one about the task category tag

lhoestq · 2022-05-03T16:51:59Z

datasets/textvqa/README.md

+
+### Data Splits
+
+There are three splits. `train`, `validation` and `test`. The `train` and `validation` sets share images with OpenImages `train` set and have their answers available. To get inference results and numbers on `test` set, you need to go to the [EvalAI leaderboard](https://eval.ai/web/challenges/challenge-page/874/overview) and upload your predictions there. Please see instructions at [https://textvqa.org/challenge/](https://textvqa.org/challenge/).


Does the test set have include answers as well ? This line in the code makes me think it may not have the answers:

item["answers"] = item.get("answers", ["" for _ in range(_NUM_ANSWERS_PER_QUESTION)])

Yes, the test set doesn't have answers. So we are returning the empty strings. Maybe we should mention this.

lhoestq · 2022-05-03T16:56:03Z

datasets/textvqa/README.md

+source_datasets:
+- original
+task_categories:
+- question-answering


question-answering means extractive-qa for text in transformers and on the Hub. In particular each task category refers to a pipeline type in transformers.

Therefore I would create a new task category for visual-question-answering instead of having it under question-answering

Feel free to open a PR to add it to https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Types.ts :) cc @osanseviero would it make sense in your opinion ?

I just noticed that visual-question-answering is already there: https://github.com/huggingface/hub-docs/blob/f9e8ec8f882ddb2b0517e97afc998f9cf398953b/js/src/lib/interfaces/Types.ts#L549

Yes, it's already there and already valid for datasets!

lhoestq

Thanks for fixing the task tag and for the comments about the missing answers !

LGTM :)

apsdehal force-pushed the add_textvqa_dataset branch 4 times, most recently from b9e6f0b to 518ae6d Compare March 19, 2022 17:32

apsdehal mentioned this pull request Mar 21, 2022

[image feature] Too many files open error when image feature is returned as a path #3985

Closed

apsdehal requested a review from lhoestq March 29, 2022 20:00

lhoestq reviewed Mar 30, 2022

View reviewed changes

apsdehal and others added 2 commits May 2, 2022 19:12

Apply suggestions from code review

b89484d

Co-authored-by: Quentin Lhoest <[email protected]>

apsdehal force-pushed the add_textvqa_dataset branch from 549bcec to ffcbbd6 Compare May 2, 2022 20:33

Address comments

eef7add

apsdehal force-pushed the add_textvqa_dataset branch from ffcbbd6 to eef7add Compare May 2, 2022 20:35

apsdehal requested a review from lhoestq May 2, 2022 20:35

lhoestq reviewed May 3, 2022

View reviewed changes

apsdehal added 2 commits May 3, 2022 17:42

Address more comments

6adf0ca

Explicitly mention about test set answers

4125557

lhoestq approved these changes May 4, 2022

View reviewed changes

apsdehal merged commit f37c876 into huggingface:master May 5, 2022

apsdehal deleted the add_textvqa_dataset branch May 5, 2022 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add TextVQA dataset #3967

[feat] Add TextVQA dataset #3967

apsdehal commented Mar 18, 2022

HuggingFaceDocBuilderDev commented Mar 18, 2022 •

edited

Loading

lhoestq left a comment

lhoestq Mar 30, 2022

lhoestq Mar 30, 2022

lhoestq Mar 30, 2022

lhoestq Mar 30, 2022

lhoestq commented Apr 27, 2022

apsdehal commented Apr 28, 2022 •

edited

Loading

lhoestq May 3, 2022 •

edited

Loading

lhoestq left a comment

lhoestq May 3, 2022

apsdehal May 3, 2022

lhoestq May 3, 2022

apsdehal May 3, 2022

osanseviero May 3, 2022

lhoestq left a comment


		#### Annotation process

		See the [paper](https://arxiv.org/abs/1904.08920).


		BUILDER_CONFIGS = [TextvqaConfig(split) for split in _SPLITS]

		DEFAULT_CONFIG_NAME = "train"


		### Data Splits

		There are three splits. `train`, `validation` and `test`. The `train` and `validation` sets share images with OpenImages `train` set and have their answers available. To get inference results and numbers on `test` set, you need to go to the [EvalAI leaderboard](https://eval.ai/web/challenges/challenge-page/874/overview) and upload your predictions there. Please see instructions at [https://textvqa.org/challenge/](https://textvqa.org/challenge/).

[feat] Add TextVQA dataset #3967

[feat] Add TextVQA dataset #3967

Conversation

apsdehal commented Mar 18, 2022

HuggingFaceDocBuilderDev commented Mar 18, 2022 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhoestq commented Apr 27, 2022

apsdehal commented Apr 28, 2022 • edited Loading

lhoestq May 3, 2022 • edited Loading

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 18, 2022 •

edited

Loading

apsdehal commented Apr 28, 2022 •

edited

Loading

lhoestq May 3, 2022 •

edited

Loading