Skip to content

下游任务数据集

Zhaoxin edited this page Jan 8, 2021 · 26 revisions

English | 中文

CLUE 数据集

CLUE 是中文语言理解测评基准,包括分类和机器阅读理解任务,CLUE中的数据集为JSON格式。对于分类数据集,我们将JSON格式转换为TSV格式,以便UER可以直接加载它们;对于机器阅读理解,将保留原始格式,并将数据集预处理包括在项目中。

Classification:

Dataset Link
TNEWS https://share.weiyun.com/maExfIeO
CSL https://share.weiyun.com/LftIGlIT
CMNLI https://share.weiyun.com/hn3kTeKm
OCNLI https://share.weiyun.com/3DlKxB3q
AFQMC https://share.weiyun.com/CdlEKMON
IFLYTEK https://share.weiyun.com/ldiLjnZJ
CLUEWSC2020 https://share.weiyun.com/RLL1ShBi

Machine reading comprehension:

Dataset Link
CMRC2018 https://share.weiyun.com/p3Y9INyC
C3 in the project
ChID https://share.weiyun.com/Mix4q2ns

Named entity recognition:

Dataset Link
CLUENER2020 https://share.weiyun.com/smSMtLkn

Baidu ERNIE

ERNIE provides 5 Chinese datasets in its first version and use them to test ERNIE's performance.

Dataset Link
ChnSentiCorp in the project
LCQMC https://share.weiyun.com/5Fmf2SZ
XNLI https://share.weiyun.com/mcd8EApl
MSRA-NER in the project
NLPCC-DBQA https://share.weiyun.com/5HJMbih
Clone this wiki locally