This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
pip install -r requirements.txt
Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/
.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/
.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/
.
Download the latest dump of Wikipedia from HERE, and extract it into ./dataset/pretrain_data/download_wikipedia/
.
Download a mirror of BooksCorpus from HERE, and extract it into ./dataset/pretrain_data/download_bookcorpus/
.
bash create_pretrain_data.sh
bash create_pretrain_feature.sh
The features of Wikipedia, BooksCorpus, and their concatenation will be saved into ./dataset/pretrain_data/wikipedia_nomask/
,
./dataset/pretrain_data/bookcorpus_nomask/
, and ./dataset/pretrain_data/wiki_book_nomask/
, respectively.
Download the GLUE dataset using the script in HERE, and put the files into ./dataset/glue/
.
Download the SQuAD v1.1 and v2.0 datasets from the following links:
and put them into ./dataset/squad/
.
bash pretrain_supernet.sh
The checkpoints will be saved into ./exp/pretrain/supernet/
,
and the names of the sub-directories should be modified into stage1_2
and stage3
correspondingly.
We also provide the checkpoint of the supernet in stage 3 (pre-trained with both Wikipedia and BooksCorpus) at HERE.
bash train.sh
The checkpoints will be saved into ./exp/train/bert_base/
,
and the names of the sub-directories should be modified into the corresponding task name
(i.e., mnli
, qqp
, qnli
, sst-2
, cola
, sts-b
, mrpc
, rte
, wnli
, squad1.1
, and squad2.0
).
Each sub-directory contains a checkpoint named best_model.bin
.
bash ffn_search.sh
The checkpoints will be saved into ./exp/ffn_search/
.
bash finetune.sh
The checkpoints will be saved into ./exp/downstream/tiny_bert/
.
bash nas_finetune.sh
The above script will first pre-train the student models based on the pre-trained checkpoint of the supernet in stage 3,
and save the pre-trained checkpoints into ./exp/pretrain/auto_bert/
.
Then fine-tune it on the downstream datasets,
and save the fine-tuned checkpoints into ./exp/downstream/auto_bert/
.
We also provide the pre-trained checkpoints of the student models (including EfficientBERT-TINY, EfficientBERT, and EfficientBERT++) at HERE.
bash nas_finetune_transfer.sh
The pre-trained and fine-tuned checkpoints will be saved into
./exp/pretrain/auto_tiny_bert/
and ./exp/downstream/auto_tiny_bert/
, respectively.
bash test.sh
The test results will be saved into ./test_results/
.
If you find this code helpful for your research, please cite the following paper.
@inproceedings{dong2021efficient-bert,
title = {{E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation},
author = {Chenhe Dong and Guangrun Wang and Hang Xu and Jiefeng Peng and Xiaozhe Ren and Xiaodan Liang},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
year = {2021}
}