Continual Learning in Visual Question Answering

This repository contains information about the following three settings for studying continual learning in Visual Question Answering:

Diverse Domains: Each task is defined based on the objects that appear in the images. Different objects are grouped randomly together in each of the five tasks.
Taxonomy Domains: Each task is defined based on the objects that appear in the images. Objects from the same supercategory are grouped in the same task, leading to the five following tasks: Animals, Food, Interior, Sports, Transport. The definitions of the tasks follow work by Del Chiaro et al., 2020.
Question Types: Each task is defined based on the question type: Action Recognition (e.g. "What are the kids doing?"), Color Recognition (e.g. "What color hat is the man wearing?"), Counting ("How many of the people are wearing hats?"), Subcategory Recognition (e.g. "What type of hat is he wearing?"), Scene-level Recongition (e.g. "Was this photo taken indoors?"). The definitions of the tasks follow work by Whitehead et al., 2021.

Task Statistics per Setting

Diverse Domains

Task	Train	Validation	Test	Number of Classes
Group_1	44254	11148	28315	2259
Group_2	39867	10202	22713	1929
Group_3	37477	9386	23095	1897
Group_4	35264	8871	21157	2165
Group_5	24454	6028	14490	1837

Taxonomy Domains

Task	Train	Validation	Test	Number of Classes
Animals	37270	9237	22588	1378
Food	26191	6612	15967	1419
Interior	43576	11038	26594	2143
Sports	32885	8468	19205	1510
Transport	41394	10280	25416	2009

Question Types

Task	Train	Validation	Test	Number of Classes
Action	18730	4700	11008	233
Color	34588	8578	21559	92
Count	38857	9649	23261	42
Scene	25850	6417	14847	170
Subcategory	22324	5419	13564	659

Data

Download the VQA data from the visualqa.org. Because the annotations from the test set are not publicly available, the VQA-v2 validation data are used as the test set in ContVQA, and the VQA-v2 training data are split into train and validation set.
Get the question ids for each task under the corresponding folder in data/. Each file contains the ids for the train/validation/test splits in the following format:

{
    task_name: 
    [
        question_id
    ]
}

For more details about the settings please refer to our preprint. Note that the main results are averaged over five random task orders which can be found under task_orders/.

Code

If you want to run any of the provided code (to get dataset statistics or plots), first run:

# Tested with Python 3.10 
pip install -r requirements.txt
pip install -e .
./scripts/download_extra_data.sh

Citation

@article{nikandrou2022task,
  title={Task formulation matters when learning continually: A case study in visual question answering},
  author={Nikandrou, Mavina and Yu, Lu and Suglia, Alessandro and Konstas, Ioannis and Rieser, Verena},
  journal={arXiv preprint arXiv:2210.00044},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
scripts		scripts
src/contvqa		src/contvqa
task_orders		task_orders
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
figure.svg		figure.svg
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continual Learning in Visual Question Answering

Task Statistics per Setting

Diverse Domains

Taxonomy Domains

Question Types

Data

Code

Citation

About

Languages

License

MalvinaNikandrou/contvqa

Folders and files

Latest commit

History

Repository files navigation

Continual Learning in Visual Question Answering

Task Statistics per Setting

Diverse Domains

Taxonomy Domains

Question Types

Data

Code

Citation

About

Resources

License

Stars

Watchers

Forks

Languages