-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with VQA finetuning #59
Comments
Hi Markov, for your custom answer candidate set, please also prepare a custom |
Doesn't it cripple the zero shot capability? |
Hi, since we have utilized various sources of VQA samples during pretraining, for zero-shot (open-domain) VQA, we directly turn to the pretrained OFA-Large, which do not set this constraint. For more details of zero-shot VQA inference, please refer to the open-domain VQA Colab (url). The VQA fine-tuning process is specifically targeted for the VQAv2 challenge, whose answer is more restricted into the 3,129 candidate answer set. To achieve higher accuracy on this specific challenge, we use trie-based constrained training & inference using the |
Hi @yangapku , please provide a given example for Thanks |
@phanxuanphucnd You can refer to the trainval_ans2label.pkl file we provided for VQAv2. |
Hi @yangapku An illustrate example of trainval_ans2label.pkl file such as:
Thanks |
Hi, it's a python-dict which does mapping from candidate answer text to its index (starting from 0). The indexes can be assigned just by random with no specific rules. Just make sure that each candidate answer is assigned with a unique index and the indexes are assigned continuously from 0. All the ground-truth answers of training and validation samples should be included in this candidate answer set. |
Yes, i understand it as follows: THanks @yangapku |
@phanxuanphucnd Yes. In our practice on VQAv2 dataset which has a long-tailed distribution of all the appeared ground-truth answers, we follow the common practice which uses the most frequent 3,129 answers as the candidate set to build this dict. Then we filtered the original training and valid split, only the question-answer pair whose answer is in this candidate set is kept for finetuning OFA. |
Hello! I am trying to finetune OFA-large on VQA using Visual Genome dataset, using the finetuning instruction in the repo. Unfortunately, I have encountered a bug that I have some difficulties indentifying. I preprocessed the data exactly like in an example, but during training my gradients overflow and model does not train.
I narrowed the issue to the answers column. If I replace this column in my dataset with the column in the dataset provided in the repo, everything works fine. However, if I change the answers in the column, or even modify them in any way I get the same issue. I suspected that my procedure of changing the column could be a problem, but if I "modify" the column with empty string, it still works. Any other symbol added to the column again concludes in an overflow. I also tried modifying not the whole column, but single elements, and found out that changing certain answers does not lead to an overflow, while changing others does. I was unable to further narrow the issue or find any pattern in it.
I train on single server with 1 GPU.
The text was updated successfully, but these errors were encountered: