Skip to content

A list of recent papers regarding visual(image) question answering「mainly from arxiv.com」

Notifications You must be signed in to change notification settings

DerekDLP/VQA-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

Visual(image) Question Answering - Papers

A reading list of resources dedicated to visual(image) question answering「mainly from arxiv.com」

Bookmarks

2015 Papers

ID Title Ori Date Latest Date Notes Pubilshed
(Incomplete Statistics)
1 VQA: Visual Question Answering 2015.05.03 2016.10.26 [Data] [code] ICCV 2015
2 Ask Your Neurons: A Neural-based Approach to Answering Questions about Images 2015.05.05 2015.10.01 ICCV 2015
3 Exploring Models and Data for Image Question Answering 2015.05.08 2015.11.29 NIPS 2015
4 Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering 2015.05.21 2015.11.02 [Data] NIPS 2015
5 Visual Madlibs: Fill in the blank Image Generation and Question Answering 2015.05.31
6 What value do explicit high level concepts have in vision to language problems? 2015.06.03 2016.04.29 CVPR 2016
7 Semantic Amodal Segmentation 2015.09.03 2016.12.14
8 VISALOGY: Answering Visual Analogy Questions 2015.10.30 NIPS 2015
9 Stacked Attention Networks for Image Question Answering 2015.11.06 2016.01.26 [code1] [code2] CVPR 2016
10 Explicit Knowledge-based Reasoning for Visual Question Answering 2015.11.09 2015.11.11
11 Neural Module Networks 2015.11.09 2017.07.24
12 Visual7W: Grounded Question Answering in Images 2015.11.11 2016.04.09 CVPR 2016
13 Yin and Yang: Balancing and Answering Binary Visual Questions 2015.11.16 2016.04.19
14 Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering 2015.11.16 2016.03.18
15 Compositional Memory for Visual Question Answering 2015.11.18
16 ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering 2015.11.18 2016.04.03
17 Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources 2015.11.22 2016.04.14 CVPR
18 Where To Look: Focus Regions for Visual Question Answering 2015.11.23 2016.01.10 Submitted to CVPR 2016
19 Simple Baseline for Visual Question Answering 2015.12.07 2015.12.15

2016 Papers

ID Title Ori Date Latest Date Notes Pubilshed
(Incomplete Statistics)
1 Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations 2016.02.23
2 Dynamic Memory Networks for Visual and Textual Question Answering 2016.03.04
3 Image Captioning and Visual Question Answering Based on Attributes and External Knowledge 2016.03.09 2016.12.16 [Overlap(2015`14)]
4 Generating Natural Questions About an Image 2016.03.19 2016.06.08 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
5 A Focused Dynamic Attention Model for Visual Question Answering 2016.04.06 Submitted to ECCV 2016
6 Counting Everyday Objects in Everyday Scenes 2016.04.12 2017.05.08
7 Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering 2016.04.16 2016.07.28
8 Leveraging Visual Question Answering for Image-Caption Ranking 2016.05.04 2015.08.31
9 Ask Your Neurons: A Deep Learning Approach to Visual Question Answering 2016.05.09 2016.11.24
10 Hierarchical Question-Image Co-Attention for Visual Question Answering 2016.05.31 2017.01.19 [code] NIPS 2016
11 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding 2016.06.06 2016.09.23 [code] EMNLP 2016
12 Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? 2016.06.11 2016.06.17 EMNLP 2016
13 Training Recurrent Answering Units with Joint Loss Minimization for VQA 2016.06.11 2016.09.29
14 FVQA: Fact-based Visual Question Answering 2016.06.17 2016.08.08
15 Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? 2016.06.17 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY.
16 DualNet: Domain-Invariant Network for Visual Question Answering 2016.06.20 2017.05.04 ICME 2017
17 Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions 2016.06.21 2016.09.26 EMNLP 2016
18 Analyzing the Behavior of Visual Question Answering Models 2016.06.23 2016.09.27 EMNLP 2016
19 Revisiting Visual Question Answering Baselines 2016.06.27 2016.11.22 European Conference on Computer Vision
20 Visual Question Answering: A Survey of Methods and Datasets 2016.07.20 [Survey]
21 Solving Visual Madlibs with Multiple Cues 2016.08.11 BMVC 2016
22 Visual Question: Predicting If a Crowd Will Agree on the Answer 2016.08.29
23 Measuring Machine Intelligence Through Visual Question Answering 2016.08.30 AI Magazine, 2016
24 Towards Transparent AI Systems: Interpreting Visual Question Answering Models 2016.08.31 2016.09.09
25 Graph-Structured Representations for Visual Question Answering 2016.09.19 2017.03.30
26 The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA) 2016.09.21
27 Tutorial on Answering Questions about Images with Deep Learning 2016.10.04 [tutorial] 2nd Summer School on Integrating Vision and Language: Deep Learning' in Malta, 2016
28 Visual Question Answering: Datasets, Algorithms, and Future Challenges 2016.10.05 2017.06.14
29 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization 2016.10.07 2017.03.21 [code] [demo1] [demo2]
30 Open-Ended Visual Question-Answering 2016.06. 2016.10.09 [web] [code] Bachelor thesis report graded with A with honours at ETSETB Telecom BCN school, Universitat Politècnica de Catalunya (UPC). June 2016.
31 Hadamard Product for Low-rank Bilinear Pooling 2016.10.14 2017.03.26 ICLR 2017
32 Proposing Plausible Answers for Open-ended Visual Question Answering 2016.10.20 2016.10.23
33 Combining Multiple Cues for Visual Madlibs Question Answering 2016.11.01 2018.02.07 submitted to IJCV
34 Dual Attention Networks for Multimodal Reasoning and Matchin 2016.11.02 2017.03.21
35 Zero-Shot Visual Question Answering 2016.11.16 2016.11.20
36 Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic 2016.11.17
37 Grad-CAM: Why did you say that? 2016.11.22 2017.01.25 NIPS 2016
38 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering 2016.12.02 2017.05.15
39 Contextual Visual Similarity 2016.12.08 Submitted to CVPR 2017
40 VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering 2016.12.12 submitted to IbPRIA 2017
41 Attentive Explanations: Justifying Decisions and Pointing to the Evidence 2016.12.14 2017.07.25
42 The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions 2016.12.16
43 Automatic Generation of Grounded Visual Questions 2016.12.20 2017.05.29 IJCAI 2017
44 CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning 2016.12.20

2017 Papers

ID Title Ori Date Latest Date Notes Pubilshed
(Incomplete Statistics)
1 Task-driven Visual Saliency and Attention-based Visual Question Answering 2017.02.22
2 Tree Memory Networks for Modelling Long-term Temporal Dependencies 2017.03.12 2018.05.20 Neurocomputing, Volume 304, 23 August 2018, Pages 64-81
3 VQABQ: Visual Question Answering by Basic Questions 2017.03.19 2017.08.28 CVPR 2017 VQA Challenge Workshop
4 Recurrent and Contextual Models for Visual Question Answering 2017.03.23
5 An Analysis of Visual Question Answering Algorithms 2017.03.28 2017.09.13 [data] ICCV 2017
6 Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks 2017.04.02 2017.10.16 ICCV 2017
7 It Takes Two to Tango: Towards Theory of AI's Mind 2017.04.03 2017.10.02
8 An Empirical Evaluation of Visual Question Answering for Novel Objects 2017.04.08 CVPR 2017
9 Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering 2017.04.11 2017.04.12 [code]
10 What's in a Question: Using Visual Questions as a Form of Supervision 2017.04.12 CVPR 2017
11 TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering 2017.04.14 2017.12.02 CVPR 2017
12 Learning to Reason: End-to-End Module Networks for Visual Question Answering 2017.04.18 2017.09.11
13 Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets 2017.04.24 2018.06.10 NAACL-HLT 2018
14 C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset 2017.04.26
15 Speech-Based Visual Question Answering 2017.05.01 2017.09.15
16 The Promise of Premise: Harnessing Question Premises in Visual Question Answering 2017.05.01 2017.08.17 EMNLP 2017
17 Survey of Visual Question Answering: Datasets and Techniques 2017.05.10 2017.05.11 [Survey]
18 ParlAI: A Dialog Research Software Platform 2017.05.18 2018.03.08
19 MUTAN: Multimodal Tucker Fusion for Visual Question Answering 2017.05.18 [code]
20 Learning Convolutional Text Representations for Visual Question Answering 2017.05.18 2018.04.18 [code] SDM 2018;
In proceedings of the 2018 SIAM International Conference on Data Mining (pp. 594-602). 2018
21 Deep learning evaluation using deep linguistic processing 2017.06.05 2018.05.12
22 A simple neural network module for relational reasoning 2017.06.05
23 Compact Tensor Pooling for Visual Question Answering 2017.06.20
24 Sampling Matters in Deep Embedding Learning 2017.06.23 2018.01.16 ICCV 2017
25 Modulating early visual processing by language 2017.07.02 2017.12.18 NIPS 2017
26 Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures 2017.07.08 [code]
27 Visual Question Answering with Memory-Augmented Networks 2017.07.16 CVPR 2018
28 Improved Bilinear Pooling with CNNs 2017.07.21
29 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering 2017.07.25 2018.03.14 [code] CVPR 2018;
winner of 2017 VQA challenge
30 A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models 2017.08.01 CVPR 2017
31 MemexQA: Visual Memex Question Answering 2017.08.03 [Web]
32 Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering 2017.08.04 ICCV 2017
33 Structured Attentions for Visual Question Answering 2017.08.07 ICCV 2017
34 Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge 2017.08.09 Winner of the 2017 Visual Question Answering (VQA) Challenge at CVPR 2017
35 Learning to Disambiguate by Asking Discriminative Questions 2017.08.09 ICCV 2017
36 Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering 2017.08.10 [Overlap(2017`32)]
37 VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation 2017.08.17 ICCV 2017
38 Robustness Analysis of Visual QA Models by Basic Questions 2017.09.14 2018.05.26 CVPR 2018
39 Exploring Human-like Attention Supervision in Visual Question Answering 2017.09.19
40 Visual Question Generation as Dual Task of Visual Question Answering 2017.09.21
41 Survey of Recent Advances in Visual Question Answering 2017.09.24 [Survey]
42 Fooling Vision and Language Models Despite Localization and Attention Mechanism 2017.09.25 2018.04.05 CVPR 2018
43 iVQA: Inverse Visual Question Answering 2017.10.09 2018.03.16 CVPR 2018
44 Active Learning for Visual Question Answering: An Empirical Study 2017.11.06
45 High-Order Attention Models for Visual Question Answering 2017.11.12 NIPS 2017
46 A Novel Framework for Robustness Analysis of Visual QA Models 2017.11.16 2018.12.24 AAAI 2019
47 Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering 2017.11.17 2017.12.12 AAAI 2018
48 Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract) 2017.11.17 [Overlap(2016`41)]
49 Visual Question Answering as a Meta Learning Task 2017.11.21
50 Locally Smoothed Neural Networks 2017.11.22 ACML 2017
51 Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end 2017.11.28
52 Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering 2017.12.01 2018.06.03 CVPR 2018
53 Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks 2017.12.03
54 Learning by Asking Questions 2017.12.04
55 IQA: Visual Question Answering in Interactive Environments 2017.12.08 2018.09.06 CVPR 2018
56 Visual Explanations from Hadamard Product in Multimodal Deep Networks 2017.12.17 NIPS 2017
57 Interpretable Counting for Visual Question Answering 2017.12.22 2018.03.01 ICLR 2018

2018 Papers

ID Title Ori Date Latest Date Notes Pubilshed
(Incomplete Statistics)
1 Benchmark Visual Question Answer Models by using Focus Map 2018.01.13 [Overlap(2017)] course CS348
2 Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering 2018.01.23
3 DVQA: Understanding Data Visualizations via Question Answering 2018.01.24 2018.03.29 CVPR 2018
4 Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions 2018.01.27
5 Object-based reasoning in VQA 2018.01.29 WACV 2018
6 Dual Recurrent Attention Units for Visual Question Answering 2018.02.01 2018.11.07
7 Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog 2018.02.11 2018.11.28 NIPS 2018
8 Learning to Count Objects in Natural Images for Visual Question Answering 2018.02.15 [code] ICLR 2018
9 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence 2018.02.15 [Overlap(2016`41)]
10 VizWiz Grand Challenge: Answering Visual Questions from Blind People 2018.02.22 2018.05.09
11 Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool 2018.03.16 [Overlap(2017`43)]
12 VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions 2018.03.20 2018.08.25 ECCV 2018
13 Attention on Attention: Architectures for Visual Question Answering (VQA) 2018.03.20 [code]
14 Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering 2018.03.23 AAAI 2018
15 Generalized Hadamard-Product Fusion Operators for Visual Question Answering 2018.03.25 2018.04.06 CRV, 2018, 15th Canadian Conference on Computer and Robot Vision
16 DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer 2018.03.30
17 Differential Attention for Visual Question Answering 2018.03.30 [Web] CVPR 2018
18 Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering 2018.04.02 2018.12.01 CVPR 2018
19 Question Type Guided Attention in Visual Question Answering 2018.04.05 2018.07.18
20 Reciprocal Attention Fusion for Visual Question Answering 2018.05.11 2018.07.22 the British Machine Vision Conference (BMVC), September 2018
21 Did the Model Understand the Question? 2018.05.14 ACL 2018
22 Bilinear Attention Networks 2018.05.21 2018.10.19 NIPS 2018
23 Reproducibility Report for 2018.05.21 Reproducibility in ML Workshop, ICML 2018
24 Joint Image Captioning and Question Answering 2018.05.22
25 R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering 2018.05.24 2018.07.19 [data] SIGKDD 2018
26 On the Flip Side: Identifying Counterexamples in Visual Question Answering 2018.06.03 2018.07.24 [framework] KDD 2018
27 CS-VQA: Visual Question Answering with Compressively Sensed Images 2018.06.08 ICIP 2018
28 Learning Answer Embeddings for Visual Question Answering 2018.06.10 CVPR 2018
39 Cross-Dataset Adaptation for Visual Question Answering 2018.06.10 CVPR 2018
30 Learning Visual Knowledge Memory Networks for Visual Question Answering 2018.06.13 CVPR 2018
31 Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2018.06.19 2018.11.01 [code] NIPS 2018
32 Question Relevance in Visual Question Answering 2018.07.23 [code]
33 Pythia v0.1: the Winning Entry to the VQA Challenge 2018 2018.07.26 2018.07.27 [code] winner of 2018 VQA challenge
34 Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining 2018.08.01
35 Learning Visual Question Answering by Bootstrapping Hard Attention 2018.08.01 ECCV 2018
36 Question-Guided Hybrid Convolution for Visual Question Answering 2018.08.08 ECCV 2018
37 Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering 2018.09.04 ECCV 2018
38 Interpretable Visual Question Answering by Reasoning on Dependency Trees 2018.09.08
39 Faithful Multimodal Explanation for Visual Question Answering 2018.09.08 AAAI 2019
40 The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA 2018.09.12
41 The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR 2018.09.11 ECCV 2018
42 Textually Enriched Neural Module Networks for Visual Question Answering 2018.09.23 [Overlop(2018\CVPR 2018)] IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
43 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding 2018.10.04 2019.01.14 [Web] [code] NIPS 2018
44 Transfer Learning via Unsupervised Task Discovery for Visual Question Answering 2018.10.03
45 Overcoming Language Priors in Visual Question Answering with Adversarial Regularization 2018.10.08 2018.11.08 NIPS 2018
46 Knowing Where to Look? Analysis on Attention of Visual Question Answering System 2018.10.09 ECCV SiVL Workshop paper
47 Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures 2018.10.24
48 Do Explanations make VQA Models more Predictable to a Human? 2018.10.29 EMNLP 2018
49 TallyQA: Answering Complex Counting Questions 2018.10.31 [data] AAAI 2019
50 Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering 2018.11.01 NIPS 2018
51 Zero-Shot Transfer VQA Dataset 2018.11.01
52 Explicit Bias Discovery in Visual Question Answering Models 2018.11.19
53 VQA with no questions-answers training 2018.11.20
54 Visual Entailment Task for Visually-Grounded Language Learning 2018.11.20 NeurIPS 2018
55 Visual Question Answering as Reading Comprehension 2018.11.28
56 From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts 2018.11.29
57 Systematic Generalization: What Is Required and Can It Be Learned? 2018.11.30 Work in progress
58 Learning Representations of Sets through Optimized Permutations 2018.12.10 2019.01.14 ICLR 2019
59 Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering 2018.12.12 report
60 Multi-modal Learning with Prior Visual Relation Reasoning 2018.12.23
61 The meaning of "most" for visual question answering models 2018.12.31

2019 Papers

ID Title Ori Date Latest Date Notes Pubilshed
(Incomplete Statistics)
1 Visual Entailment: A Novel Task for Fine-Grained Image Understanding 2019.01.20
2 BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection 2019.01.31
3 Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded 2019.02.11 Technical Report
4 Cycle-Consistency for Robust Visual Question Answering 2019.02.14
5 Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention 2019.02.15
6 Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering 2019.02.20
7 MUREL: Multimodal Relational Reasoning for Visual Question Answering 2019.02.25 CVPR 2019
8 GQA: a new dataset for compositional question answering over real-world images 2019.02.25
9 Answer Them All! Toward Universal Visual Question Answering Models 2019.03.01

About

A list of recent papers regarding visual(image) question answering「mainly from arxiv.com」

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published