The data file is in jsonl format.
- annotations
- document_html
- document_title
- document_tokens
- document_url
- example_id
- long_answer_candidates
- question_text
- question_tokens
annotation_id
long_answer:
- candidate_index
- end_byte
- end_token
- start_byte
- start_token.
short_answers:
- end_byte
- end_token
- start_byte
- start_token.
yes_no_answer
HTML file of the Wikipedia article.
Title of the document.
Individual words in the article, including HTML tokens like:
<H1>
,<Table>
, ...
The link to the Wikipedia page which contains the article.
ID of the article
A list of dictionaries, each dictionary contains 4 components:
end_byte
,end_token
,start_byte
,start_token
and another boolean component:top_level
.
Full text of the question. For example:
when is the last episode of season 8 of the walking dead
List of tokens appeared in the question. For example:
['when', 'is', 'the', 'last', 'episode', 'of', 'season', '8', 'of', 'the', 'walking', 'dead']