Skip to content

Commit

Permalink
[DOC] Add ernie-1.0-base-zh-cw benchmark results. (#3248)
Browse files Browse the repository at this point in the history
  • Loading branch information
ZHUI authored Sep 15, 2022
1 parent 8fc38d6 commit 37a6860
Show file tree
Hide file tree
Showing 5 changed files with 96 additions and 16 deletions.
43 changes: 41 additions & 2 deletions examples/benchmark/clue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
</tr> <tr>
<td rowspan=3 align=center> 24L1024H </td>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Large-zh-CW</span>
<span style="font-size:18px">ERNIE 1.0-Large-zh-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px"><b>79.03</b></span>
Expand Down Expand Up @@ -222,7 +222,7 @@
</td>
</tr>
<tr>
<td rowspan=8 align=center> 12L768H </td>
<td rowspan=9 align=center> 12L768H </td>
<td style="text-align:center">
<span style="font-size:18px">
<a href="https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams">
Expand Down Expand Up @@ -264,6 +264,44 @@
<span style="font-size:18px"><b>77.88</b></span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Base-zh-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.47</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.07</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">57.86</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">59.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.41</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">79.58</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">89.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.42</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">72.88/90.78</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">84.68</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.98</span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE-Gram-zh</span>
Expand Down Expand Up @@ -1196,6 +1234,7 @@ AFQMC(语义相似度)、TNEWS(文本分类)、IFLYTEK(长文本分类
| ERNIE 2.0-Large-zh | 1e-5,32 | 3e-5,64 | 3e-5,32 | 2e-5,32 | 1e-5,16 | 3e-5,32 | 1e-5,64 | 2e-5,24 | 2e-5,24 | 3e-5,32 |
| HFL/RoBERTa-wwm-ext-large | 1e-5,32 | 3e-5,32 | 2e-5,32 | 1e-5,16 | 1e-5,16 | 2e-5,16 | 2e-5,16 | 3e-5,32 | 1e-5,24 | 2e-5,24 |
| ERNIE 3.0-Base-zh | 3e-5,16 | 3e-5,32 | 5e-5,32 | 3e-5,32 | 2e-5,64 | 2e-5,16 | 2e-5,32 | 2e-5,24 | 3e-5,24 | 3e-5,32 |
| ERNIE 1.0-Base-zh-cw | 2e-5,16 | 3e-5,32 | 5e-5,16 | 2e-5,16 | 3e-5,32 | 2e-5,16 | 2e-5,32 | 3e-5,24 | 2e-5,32 | 3e-5,24 |
| ERNIE-Gram-zh | 1e-5,16 | 5e-5,16 | 5e-5,16 | 2e-5,32 | 2e-5,64 | 3e-5,16 | 3e-5,64 | 3e-5,32 | 2e-5,24 | 2e-5,24 |
| ERNIE 2.0-Base-zh | 3e-5,64 | 3e-5,64 | 5e-5,16 | 5e-5,64 | 5e-5,32 | 5e-5,16 | 2e-5,16 | 2e-5,32 | 3e-5,24 | 3e-5,32 |
| Langboat/Mengzi-Bert-Base | 3e-5,32 | 5e-5,32 | 5e-5,16 | 2e-5,16 | 2e-5,16 | 3e-5,8 | 1e-5,16 | 3e-5,24 | 3e-5,24 | 2e-5,32 |
Expand Down
8 changes: 4 additions & 4 deletions model_zoo/ernie-1.0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,24 +484,24 @@ python3 -u -m paddle.distributed.launch \

我们release了base、large两个模型。均取得了较好的预训练效果。

- **ERNIE 1.0-Base-zh-CW** 模型:
- **ERNIE 1.0-Base-zh-cw** 模型:
- 使用CLUE,WuDao共计400GB的语料,batch_size 1024, 训练 400w step,即可训练得到`ernie-3.0-base-zh`类似的模型效果。相关模型参数,开源为`ernie-1.0-base-zh-cw`,用户加载即可使用。使用CLUE benchmark 对最优超参数进行GradSearch搜索:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1| Acc| Acc | Acc
ERNIE 1.0-Base-zh-CW | 12L768H | <b>76.44</b> | 76.04 | 58.02 | 60.87 | 83.56 | 78.61 | 89.14 | 84.00 | 72.26/90.40 | 84.73 | 77.15 |
ERNIE 1.0-Base-zh-cw | 12L768H | <b>76.47</b> | 76.07 | 57.86 | 59.91 | 83.41 | 79.91 | 89.91 | <b>83.42</b> | 72.88/90.78 | <b>84.68</b> | 76.98 |
ERNIE 2.0-Base-zh | 12L768H | 74.95 | 76.25 | 58.53 | 61.72 | 83.07 | 78.81 | 84.21 | 82.77 | 68.22/88.71 | 82.78 | 73.19
ERNIE 1.0-Base-zh | 12L768H | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68
-
- **ERNIE 1.0-Large-zh-CW** 模型:
- **ERNIE 1.0-Large-zh-cw** 模型:

- 除了base模型外,我们还训练了放出了large模型。此模型参数采用的是词表与ernie-1.0相同,因此命名为`ernie-1.0-large-zh-cw`。使用开源语料,batch_size 512, 训练 400w step,训练去除SOP任务,只保留MLM损失:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1 | Acc| Acc
ERNIE 1.0-Large-zh-CW| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 1.0-Large-zh-cw | 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 3.0-Xbase-zh| 20L1024H | 78.71 | 76.85 | 59.89 | 62.41 | 84.76 | 82.51 | 89.80 | 84.47 | 75.49/92.67 | 86.36 | 84.59
RoBERTa-wwm-ext-large | 24L1024H | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26

Expand Down
12 changes: 6 additions & 6 deletions model_zoo/ernie-1.0/pretraining_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ PaddleNLP致力于预训练开源工作,使用开源中文语料CLUE、WuDao
- [3.4 训练数据流配置](#data_pipe)
- [3.5 观察评估](#观察评估)
- [4. 训练效果](#release_models)
- [4.1 ERNIE 1.0-Base-zh-CW 模型](#ernie-1.0-base-zh-cw)
- [4.2 ERNIE 1.0-Large-zh-CW 模型](#ernie-1.0-large-zh-cw)
- [4.1 ERNIE 1.0-Base-zh-cw 模型](#ernie-1.0-base-zh-cw)
- [4.2 ERNIE 1.0-Large-zh-cw 模型](#ernie-1.0-large-zh-cw)
* [5. 参考](#references)

全部流程介绍图如下:
Expand Down Expand Up @@ -577,28 +577,28 @@ python3 -u -m paddle.distributed.launch \

<a name="ernie-1.0-base-zh-cw"></a>

### 4.1 ERNIE 1.0-Base-zh-CW 模型
### 4.1 ERNIE 1.0-Base-zh-cw 模型

使用CLUE,WuDao共计400GB的语料,batch_size 1024, 训练 400w step,即可训练得到`ernie-3.0-base-zh`类似的模型效果。相关模型参数,开源为`ernie-1.0-base-zh-cw`,用户加载即可使用。使用CLUE benchmark 对最优超参数进行GradSearch搜索:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1| Acc| Acc
ERNIE 1.0-Base-zh-CW | 12L768H | <b>76.44</b> | 76.04 | 58.02 | 60.87 | 83.56 | 78.61 | 89.14 | 84.00 | 72.26/90.40 | 84.73 | 77.15 |
ERNIE 1.0-Base-zh-cw | 12L768H | <b>76.47</b> | 76.04 | 57.86 | 59.91 | <b>83.41</b> | 79.58 | 89.91 | 83.42 | 72.88/90.78 | <b>84.68</b> | 76.98 |
ERNIE 2.0-Base-zh | 12L768H | 74.32 | 75.65 | 58.25 | 61.64 | 82.62 | 78.71 | 81.91 | 82.33 | 66.08/87.46 | 82.78 | 73.19
ERNIE 1.0-Base-zh | 12L768H | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68


<a name="ernie-1.0-large-zh-cw"> </a>

### 4.2 ERNIE 1.0-Large-zh-CW 模型
### 4.2 ERNIE 1.0-Large-zh-cw 模型

除了base模型外,我们还训练了large模型。命名为`ernie-1.0-large-zh-cw`。使用开源语料,batch_size 512, 训练 400w step,训练去除SOP任务,只保留MLM损失,使用CLUE benchmark 对最优超参数进行GradSearch搜索:

Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1 | Acc| Acc
ERNIE 1.0-Large-zh-CW| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 1.0-Large-zh-cw| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
ERNIE 3.0-Xbase-zh| 20L1024H | 78.39 | 76.16 | 59.55 | 61.87 | 84.40 | 81.73 | 88.82 | 83.60 | 75.99/93.00 | 86.78 | 84.98
RoBERTa-wwm-ext-large | 24L1024H | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26

Expand Down
7 changes: 5 additions & 2 deletions model_zoo/ernie-1.0/run_pretrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,8 +541,11 @@ def do_train(args):
ctx_manager = contextlib.nullcontext() if sys.version_info >= (
3, 7) else contextlib.suppress()

if worker_num > 1 and (args.use_recompute
or args.accumulate_steps > 1):
if worker_num > 1 and (args.use_recompute or
((step + 1) % args.accumulate_steps != 0)):
# grad acc, no_sync when (step + 1) % args.accumulate_steps != 0:
# recompute, no_sync every where
# recompute + grad_acc, no_sync every where
ctx_manager = model.no_sync()
else:
ctx_manager = contextlib.nullcontext() if sys.version_info >= (
Expand Down
42 changes: 40 additions & 2 deletions model_zoo/ernie-3.0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
<tr>
<td rowspan=3 align=center> 24L1024H </td>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Large-CW</span>
<span style="font-size:18px">ERNIE 1.0-Large-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px"><b>79.03</b></span>
Expand Down Expand Up @@ -291,7 +291,7 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
</td>
</tr>
<tr>
<td rowspan=8 align=center> 12L768H </td>
<td rowspan=9 align=center> 12L768H </td>
<td style="text-align:center">
<span style="font-size:18px">
<a href="https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams">
Expand Down Expand Up @@ -333,6 +333,44 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
<span style="font-size:18px"><b>77.88</b></span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE 1.0-Base-zh-cw</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.47</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.07</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">57.86</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">59.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.41</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">79.58</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">89.91</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">83.42</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">72.88/90.78</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">84.68</span>
</td>
<td style="text-align:center">
<span style="font-size:18px">76.98</span>
</td>
</tr>
<tr>
<td style="text-align:center">
<span style="font-size:18px">ERNIE-Gram-zh</span>
Expand Down

0 comments on commit 37a6860

Please sign in to comment.