EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

pip install -r requirements.txt

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the latest dump of Wikipedia from HERE, and extract it into ./dataset/pretrain_data/download_wikipedia/.
Download a mirror of BooksCorpus from HERE, and extract it into ./dataset/pretrain_data/download_bookcorpus/.

- Pre-training data

bash create_pretrain_data.sh
bash create_pretrain_feature.sh

The features of Wikipedia, BooksCorpus, and their concatenation will be saved into ./dataset/pretrain_data/wikipedia_nomask/, ./dataset/pretrain_data/bookcorpus_nomask/, and ./dataset/pretrain_data/wiki_book_nomask/, respectively.

- Fine-tuning data

Download the GLUE dataset using the script in HERE, and put the files into ./dataset/glue/.
Download the SQuAD v1.1 and v2.0 datasets from the following links:

and put them into ./dataset/squad/.

Pre-train the supernet

bash pretrain_supernet.sh

The checkpoints will be saved into ./exp/pretrain/supernet/, and the names of the sub-directories should be modified into stage1_2 and stage3 correspondingly.

We also provide the checkpoint of the supernet in stage 3 (pre-trained with both Wikipedia and BooksCorpus) at HERE.

Train the teacher model (BERT-base)

bash train.sh

The checkpoints will be saved into ./exp/train/bert_base/, and the names of the sub-directories should be modified into the corresponding task name (i.e., mnli, qqp, qnli, sst-2, cola, sts-b, mrpc, rte, wnli, squad1.1, and squad2.0). Each sub-directory contains a checkpoint named best_model.bin.

Conduct NAS (including search stage 1, 2, and 3)

bash ffn_search.sh

The checkpoints will be saved into ./exp/ffn_search/.

Distill the student model

- TinyBERT-4, TinyBERT-6

bash finetune.sh

The checkpoints will be saved into ./exp/downstream/tiny_bert/.

- EfficientBERT-tiny, EfficientBERT, EfficientBERT+, EfficientBERT++

bash nas_finetune.sh

The above script will first pre-train the student models based on the pre-trained checkpoint of the supernet in stage 3, and save the pre-trained checkpoints into ./exp/pretrain/auto_bert/. Then fine-tune it on the downstream datasets, and save the fine-tuned checkpoints into ./exp/downstream/auto_bert/.

We also provide the pre-trained checkpoints of the student models (including EfficientBERT-TINY, EfficientBERT, and EfficientBERT++) at HERE.

- EfficientBERT (TinyBERT-6)

bash nas_finetune_transfer.sh

The pre-trained and fine-tuned checkpoints will be saved into ./exp/pretrain/auto_tiny_bert/ and ./exp/downstream/auto_tiny_bert/, respectively.

Test on the GLUE dataset

bash test.sh

The test results will be saved into ./test_results/.

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021efficient-bert,
  title     = {{E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation},
  author    = {Chenhe Dong and Guangrun Wang and Hang Xu and Jiefeng Peng and Xiaozhe Ren and Xiaodan Liang},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
  year      = {2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
metrics		metrics
models		models
pretrain_data_scripts		pretrain_data_scripts
tokenizers		tokenizers
utils		utils
.gitignore		.gitignore
README.md		README.md
create_pretrain_data.sh		create_pretrain_data.sh
create_pretrain_feature.py		create_pretrain_feature.py
create_pretrain_feature.sh		create_pretrain_feature.sh
dist_ffn_search.sh		dist_ffn_search.sh
dist_finetune.sh		dist_finetune.sh
dist_nas_finetune.sh		dist_nas_finetune.sh
dist_pretrain.sh		dist_pretrain.sh
dist_train.sh		dist_train.sh
ffn_search.py		ffn_search.py
ffn_search.sh		ffn_search.sh
finetune.py		finetune.py
finetune.sh		finetune.sh
nas_finetune.sh		nas_finetune.sh
nas_finetune_transfer.sh		nas_finetune_transfer.sh
pretrain.py		pretrain.py
pretrain_supernet.sh		pretrain_supernet.sh
requirements.txt		requirements.txt
test.py		test.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Requirements

Download checkpoints

Prepare dataset

- Pre-training data

- Fine-tuning data

Pre-train the supernet

Train the teacher model (BERT-base)

Conduct NAS (including search stage 1, 2, and 3)

Distill the student model

- TinyBERT-4, TinyBERT-6

- EfficientBERT-tiny, EfficientBERT, EfficientBERT+, EfficientBERT++

- EfficientBERT (TinyBERT-6)

Test on the GLUE dataset

Reference

About

Releases

Packages

Languages

cheneydon/efficient-bert

Folders and files

Latest commit

History

Repository files navigation

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Requirements

Download checkpoints

Prepare dataset

- Pre-training data

- Fine-tuning data

Pre-train the supernet

Train the teacher model (BERT-base)

Conduct NAS (including search stage 1, 2, and 3)

Distill the student model

- TinyBERT-4, TinyBERT-6

- EfficientBERT-tiny, EfficientBERT, EfficientBERT+, EfficientBERT++

- EfficientBERT (TinyBERT-6)

Test on the GLUE dataset

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages