A Survey on Table-and-Text HybridQA: Definitions, Methods, Challenges and Future Directions

This repository contains a list of papers, datasets and leaderboards of the text-and-table HybridQA task, which is carefully and comprehensively organized. If you found any error, please open an issue or pull request.

Introduction

The question-answering task with just text or tables to generate answers has been systematically studied, which we call the classic QA task. These two sorts of evidence each have their advantages: textual evidence is prevalent in daily communication, while tabular evidence is a well-organized display of numerical information. However, using the heterogeneous data that combines these two types of evidence is increasingly prevalent in real applications, particularly in fields demanding numerical reasoning, like the financial and scientific domains. This technique is known as Table-and-Text Hybrid Question Answering (HybridQA). Considering that HybridQA is still under-researched, we present this project to summarize the current development including benchmarks and their published-sota.

Benchmarks and Leaderboard

HybridQA

HybridQA is the first HybridQA benchmark, which is also the largest cross-domain benchmark to date. Each question and answer is relayed on a single table and multiple texts. Each text usually is a description of information of a table cell, for example, a hyperlink page of the cell, which is crawled from Wikipedia. For each case, the benchmark offers the golden text and table rows. All answers to questions are the spans in evidence, which called span-based answers, and need one or more hops between heterogeneous data.

Model	Organization	Reference	Dev-EM	Dev-F1	Test-EM	Test-F1
UL-20B	Google	Tay et al. (2022)	-	-	61.0	-
MITQA	IBM & IIT	Kumar et al. (2021)	65.5	72.7	64.3	71.9
RHGN	SEU	Yang et al. (2022)	62.8	70.4	60.6	68.1
POINTR + MATE	Google	Eisenschlos et al. (2021)	63.3	70.8	62.7	70.0
POINTR + TAPAS	Google	Eisenschlos et al. (2021)	63.4	71.0	62.8	70.2
MuGER²	JD AI	Wang et al. (2022)	57.1	67.3	56.3	66.2
DocHopper	CMU	Sun et al. (2021)	47.7	55.0	46.3	53.3
HYBRIDER	UCSB	Chen et al. (2020)	43.5	50.6	42.2	49.9
HYBRIDER-Large	UCSB	Chen et al. (2020)	44.0	50.7	43.8	50.6
Unsupervised-QG	NUS&UCSB	Pan et al. (2020)	25.7	30.5	-	-

OTT-QA

To lower the difficulties of answering, HybridQA annotates the related evidence to each example and the links of text and tables, which widens the gap with real-world applications. To be more relevant to the practical applications, OTT-QA blends textual and tabular evidence of each example into one single corpus that contains more than five million items and removes the relation information between them, which is called the open-QA benchmark. So the most challenging part of this benchmark is to retrieve evidence of questions from millions of heterogeneous data, like open domain question answering. The questions and evidence of OTT-QA are all built based on the HybridQA. Also, all its answers are the spans in the evidence.

Model	Organization	Reference	Dev-EM	Dev-F1	Test-EM	Test-F1
CORE	CMU + Microsoft Research	Ma et al. (2022)	49.0	55.7	47.3	54.1
OTTeR	MSRA + Beihang	Huang et al. (2022)	37.1	42.8	37.3	43.1
CARP	MSRA + Sun Yet-sen University	Zhong et al. (2021)	33.2	38.6	32.5	38.5
Fusion+Cross-Reader	Google	Chen et al. (2021)	28.1	32.5	27.2	31.5
Dual Reader-Parser	Amazon	Alexander et al. (2021)	15.8	-	-	-
BM25-HYBRIDER	UCSB	Chen et al. (2021)	10.3	13.0	9.7	12.8

FinQA

Some HybridQA answers generation require numeric reasoning compatibility, while the benchmarks with only span-based questions cannot fulfill this requirement. FinQA is a finance HybridQA benchmark containing the questions of many standard financial analysis calculations. FinQA annotates the arithmetic answer in a domain-specific language (DSL), which consists of mathematical and table operations, to reduce the difficulty of formula generation and make it more interpretable.

Model	Orgnization	Reference	Dev-Execution Accuracy	Dev-Program Accuracy	Test-Execution Accuracy	Test-Program Accuracy
PoT-SC_{code-davinci-002}	University of Waterloo	Chen et al.	-	-	68.1	-
APOLLO	MSRA + Xiamen University	Sun et al.	69.79	65.91	67.99	65.60
ELASTIC	Strath	Zhang et al. (2022)	-	-	68.96	65.21
DyRRen	Nanjing University	Li et al. (2022)	66.82	63.87	63.30	61.29
ReasonFuse	CAS	Xia et al. (2022)	61.84	59.80	60.68	58.94
FinQANet	UCSB	Chen et al. (2021)	61.22	58.05	61.24	58.86

TAT-QA

Although FinQA has presented well-annotated numerical reasoning questions, it ignores the questions with span-based answers. Similar to the classic QA benchmark DROP, TAT-QA is a collection of financial HybridQA samples that includes questions with both span-based and arithmetic answers.
Additionally, unlike the benchmarks mentioned above, each TAT-QA question is typically related to only five texts, which lowers the difficulty of retrieval. Just like FinQA, TAT-QA also provides the formulations of arithmetic questions.

Model	Orgnization	Reference	Dev-EM	Dev-F1	Test-EM	Test-F1
AeNER	HSE	Yarullin et al.	-	-	75.0	83.2
RegHNT	CAS	Lei et al.	73.6	81.3	70.3	78.0
UniRPG	Harbin Institute of Technology + JD AI Research	Zhou et al. (2022)	70.2	77.9	67.1	76.0
PoT-SC_{code-davinci-002}	University of Waterloo	Chen et al.	70.2	-	-	-
UniPCQA	CUHK	Deng et al. (2022)	68.2	75.5	63.9	72.2
MHST	NUS	Zhu et al. (2022)	68.2	76.8	63.6	72.7
GANO	National Institute of Advanced Industrial Science and Technology	Nararatwong et al. (2022)	68.4	77.8	62.1	71.6
FinMath	Northeastern University	Li et al. (2022)	60.5	66.3	58.6	64.1
TagOp	NUS	Zhu et al. (2021)	55.2	62.7	50.1	58.0

MultiHiertt

Hierarchical tables, which contain multi-level headers, are common in the real world but are hard to be expressed and be understood by models because of the complex table structure. However, almost all tables of the previous benchmarks are flattened structures without multi-level headers. To overcome this challenge, MultiHiertt collects and annotates many hierarchical tables compared with questions.

Model	Orgnization	Reference	Dev-EM	Dev-F1	Test-EM	Test-F1
NAPG	Zhengzhou University + Peng Cheng Lab	Zhang et al.	-	-	44.19	44.81
MT2Net	Yale	Zhao et al.	37.05	39.96	36.22	38.43

GeoTSQA

GeoTSQA is the first scenario-based question-answering benchmark with hybrid evidence, which requires retrieving and integrating knowledge from multiple sources and applying general knowledge to a specific case described by the scenario. This benchmark is constructed on the multiple-choice questions in the geography domain from Chinese high-school exams. Besides tables and text, each question is also provided with four options, from which model should select one as the answer.

Model	Orgnization	Reference	Accuracy
TTGen	Nanjing University	Li et al.	39.7

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Survey on Table-and-Text HybridQA: Definitions, Methods, Challenges and Future Directions

Introduction

Benchmarks and Leaderboard

HybridQA

OTT-QA

FinQA

TAT-QA

MultiHiertt

GeoTSQA

About

Releases

Packages

zirui-HIT/Awesome-HybridQA-Survey

Folders and files

Latest commit

History

Repository files navigation

A Survey on Table-and-Text HybridQA: Definitions, Methods, Challenges and Future Directions

Introduction

Benchmarks and Leaderboard

About

Resources

Stars

Watchers

Forks