Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with VQA finetuning #59

Closed
markovivl opened this issue Mar 27, 2022 · 9 comments
Closed

Problems with VQA finetuning #59

markovivl opened this issue Mar 27, 2022 · 9 comments
Assignees

Comments

@markovivl
Copy link

Hello! I am trying to finetune OFA-large on VQA using Visual Genome dataset, using the finetuning instruction in the repo. Unfortunately, I have encountered a bug that I have some difficulties indentifying. I preprocessed the data exactly like in an example, but during training my gradients overflow and model does not train.

slice_id 0 seek offset 0
2022-03-28 02:29:07 - trainer.py[line:703] - INFO: begin training epoch 1
2022-03-28 02:29:07 - train.py[line:296] - INFO: Start iterating over samples
2022-03-28 02:29:09 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0
2022-03-28 02:29:11 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0
2022-03-28 02:29:14 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 16.0
2022-03-28 02:29:15 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 8.0
2022-03-28 02:29:17 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 4.0
2022-03-28 02:29:19 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2.0
2022-03-28 02:29:22 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1.0
2022-03-28 02:29:23 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.5
2022-03-28 02:29:26 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.25
2022-03-28 02:29:28 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.125
2022-03-28 02:29:28 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 0.0625

I narrowed the issue to the answers column. If I replace this column in my dataset with the column in the dataset provided in the repo, everything works fine. However, if I change the answers in the column, or even modify them in any way I get the same issue. I suspected that my procedure of changing the column could be a problem, but if I "modify" the column with empty string, it still works. Any other symbol added to the column again concludes in an overflow. I also tried modifying not the whole column, but single elements, and found out that changing certain answers does not lead to an overflow, while changing others does. I was unable to further narrow the issue or find any pattern in it.

I train on single server with 1 GPU.

@yangapku yangapku self-assigned this Mar 28, 2022
@yangapku
Copy link
Member

Hi Markov, for your custom answer candidate set, please also prepare a custom trainval_ans2label.pkl file (a pickled python dict mapping the answer-text to label-id) to replace the provided one. This file is used in training & inference which constrains the output space from the full vocabulary to only the answer candidate set. It should be conformed with your dataset, otherwise the overflow problem will arise during training if an answer unseen in the trainval_ans2label.pkl is encountered in your dataset.

@markovivl
Copy link
Author

markovivl commented Mar 28, 2022

Doesn't it cripple the zero shot capability?

@yangapku
Copy link
Member

yangapku commented Mar 28, 2022

Hi, since we have utilized various sources of VQA samples during pretraining, for zero-shot (open-domain) VQA, we directly turn to the pretrained OFA-Large, which do not set this constraint. For more details of zero-shot VQA inference, please refer to the open-domain VQA Colab (url). The VQA fine-tuning process is specifically targeted for the VQAv2 challenge, whose answer is more restricted into the 3,129 candidate answer set. To achieve higher accuracy on this specific challenge, we use trie-based constrained training & inference using the trainval_ans2label.pkl file.

@phanxuanphucnd
Copy link

Hi @yangapku , please provide a given example for trainval_ans2label.pkl . I don' understand what a pair answer-text to label-id is?

Thanks

@yangapku
Copy link
Member

@phanxuanphucnd You can refer to the trainval_ans2label.pkl file we provided for VQAv2.

@phanxuanphucnd
Copy link

phanxuanphucnd commented Oct 21, 2022

Hi @yangapku

An illustrate example of trainval_ans2label.pkl file such as:

{ "": 0, "boats": 835, "not at all": 2421, "name": 1, "harley davidson": 78, "plain": 2, "20 ft": 2379, "museum": 3, "parking": 1710, "behind": 1590, "steeple": 4, "turning": 2380, "tent": 836, "no parking": 1995, "tulip": 1568, "low": 452, "muffin": 1172, "9:55": 846, "hair": 453 }
I don't understand what does it mean? By what mechanism are indexes assigned?
Can you help me explain?

Thanks

@yangapku
Copy link
Member

Hi, it's a python-dict which does mapping from candidate answer text to its index (starting from 0). The indexes can be assigned just by random with no specific rules. Just make sure that each candidate answer is assigned with a unique index and the indexes are assigned continuously from 0. All the ground-truth answers of training and validation samples should be included in this candidate answer set.

@phanxuanphucnd
Copy link

phanxuanphucnd commented Oct 21, 2022

Yes, i understand it as follows:
All answers in the question-answer pair of the train and valid dataset must be included in this file, right?
it will be unique values?

THanks @yangapku

@yangapku
Copy link
Member

@phanxuanphucnd Yes. In our practice on VQAv2 dataset which has a long-tailed distribution of all the appeared ground-truth answers, we follow the common practice which uses the most frequent 3,129 answers as the candidate set to build this dict. Then we filtered the original training and valid split, only the question-answer pair whose answer is in this candidate set is kept for finetuning OFA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants