A Chinese question answering dataset for legal advice.
- The corpus contains legal question answer pairs from Chinese online forums. The questions are raised by netizens and the answer are provided by licensed lawyers.
- It contains 4 data fields: question subject, question body, answer and label.
- The positive question answer pairs are the ground truth pair provided online. For all the questions, we select answers to other questions of the same category as negative answers.
- We manually annotate part of the dataset to ensure correctness. Manually annotated subsets are named as ''LegalQA-manual-train.csv'', ''LegalQA-manual-dev.csv'' and ''LegalQA-manual-test.csv''.
- For more QA pairs, please refer to the full dataset in LegalQA-all.zip which contains ''LegalQA-all-train.csv'', ''LegalQA-all-dev.csv'' and ''LegalQA-all-test.csv''.
- Any further questions, contact as by raising issues.
- LegalQA-manual
sets | Train | Dev | Test |
---|---|---|---|
Number of questions | 783 | 93 | 136 |
Number of answers | 3121 | 602 | 865 |
Average length of questions | 160 | 180 | 159 |
Average length of answers | 41 | 45 | 43 |
- LegalQA-all
sets | Train | Dev | Test |
---|---|---|---|
Number of questions | 10526 | 1593 | 3035 |
Number of answers | 21237 | 2866 | 6091 |
Average length of questions | 160 | 173 | 168 |
Average length of answers | 41 | 40 | 42 |
- Details
Dataset(train/dev/test) | #Question | #QA Pairs | %Correct |
---|---|---|---|
LegalQA(manual) | 783/93/136 | 7,258/816/1,169 | 21.8/23.3/23.9 |
LegalQA(all) | 10,526/1,593/3,035 | 100,590/11,965/26,913 | 21.8/24.4/22.9 |
- Answer Selection
subset | MAP | MRR | P@1 |
---|---|---|---|
LegalQA(manual) | 0.8230 | 0.8749 | 0.7868 |
LegalQA(all) | 0.8287 | 0.8867 | 0.8171 |