bert: add conversion script for BERT Token Dropping TF2 checkpoints #17142

stefan-it · 2022-05-09T14:40:54Z

Hi,

this PR adds a conversion script for BERT models, that were trained with the recently introduced "Token Dropping for Efficient BERT Pretraining" approach, introduced in this paper:

Models are trained with the TensorFlow 2 implementation from the TensorFlow models repository, which can be found here. Note: The model architecture only needs changes during pre-training, but the final pre-trained model is compatible with the original BERT architecture!

Unfortunately, the authors do not plan to release pre-trained checkpoints.

But I have pre-trained several models with their official implementation and I've also released the checkpoints and the PyTorch converted model weights on the Hugging Face Model Hub:

https://huggingface.co/dbmdz/bert-base-historic-multilingual-64k-td-cased

This is a multi-lingual model, that was trained on ~130GB of historic and noisy OCR'ed texts with a 64k vocab.

Conversion Script Usage

In order to test the conversion script, the following commands can be used to test the conversion:

wget "https://huggingface.co/dbmdz/bert-base-historic-multilingual-64k-td-cased/resolve/main/ckpt-1000000.data-00000-of-00001"
wget "https://huggingface.co/dbmdz/bert-base-historic-multilingual-64k-td-cased/resolve/main/ckpt-1000000.index"
wget "https://huggingface.co/dbmdz/bert-base-historic-multilingual-64k-td-cased/resolve/main/config.json"
python3 convert_bert_token_dropping_original_tf2_checkpoint_to_pytorch.py --tf_checkpoint_path ckpt-1000000 --bert_config_file config.json --pytorch_dump_path ./exported

This outputs:

All model checkpoint weights were used when initializing BertForMaskedLM.

All the weights of BertForMaskedLM were initialized from the model checkpoint at ./exported.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForMaskedLM for predictions without further training.

Masked LM Predictions

The masked lm predictions are pretty good and are comparable with the multil-lingual model, that was trained with the official BERT implementation. Just use the inference widget on the Hugging Face Model Hub.

In this example, the sentence and I cannot conceive the reafon why [MASK] hath is used to test the model. For a good comparison, the 32k hmBERT is used that was trained with the official BERT implementation on the same corpus:

[
  {
    "score": 0.3564337193965912,
    "token": 1349,
    "token_str": "she",
    "sequence": "and I cannot conceive the reafon why she hath"
  },
  {
    "score": 0.21097686886787415,
    "token": 903,
    "token_str": "it",
    "sequence": "and I cannot conceive the reafon why it hath"
  },
  {
    "score": 0.10645408183336258,
    "token": 796,
    "token_str": "he",
    "sequence": "and I cannot conceive the reafon why he hath"
  },
  {
    "score": 0.0170532688498497,
    "token": 1049,
    "token_str": "we",
    "sequence": "and I cannot conceive the reafon why we hath"
  },
  {
    "score": 0.01265314407646656,
    "token": 45,
    "token_str": "I",
    "sequence": "and I cannot conceive the reafon why I hath"
  }
]

With the 64k hmBERT model that was trained with the Token Dropping approach, the output is:

[
  {
    "score": 0.5147836804389954,
    "token": 796,
    "token_str": "he",
    "sequence": "and I cannot conceive the reafon why he hath"
  },
  {
    "score": 0.1566970944404602,
    "token": 1349,
    "token_str": "she",
    "sequence": "and I cannot conceive the reafon why she hath"
  },
  {
    "score": 0.08448878675699234,
    "token": 903,
    "token_str": "it",
    "sequence": "and I cannot conceive the reafon why it hath"
  },
  {
    "score": 0.020168323069810867,
    "token": 45,
    "token_str": "I",
    "sequence": "and I cannot conceive the reafon why I hath"
  },
  {
    "score": 0.01774059422314167,
    "token": 3560,
    "token_str": "God",
    "sequence": "and I cannot conceive the reafon why God hath"
  }
]

Downstream Task Performance

We have also used this model when participating in the HIPE-2022 Shared Task and the BERT model pre-trained with Token Dropping approach achieved really good results on the NER downstream task, see results here:

Backbone LM	Configuration	F1-Score (All, Development)	F1-Score (German, Development)	F1-Score (English, Development)	F1-Score (French, Development)	Model Hub Link
hmBERT (32k)	`bs4-e10-lr5e-05#4`	87.64	89.26	88.78	84.80	here
hmBERT (64k, token dropping)	`bs8-e10-lr3e-05#3`	87.02	88.89	86.63	85.50	here

HuggingFaceDocBuilderDev · 2022-05-09T15:08:48Z

The documentation is not available anymore as the PR was closed or merged.

src/transformers/models/bert/convert_bert_token_dropping_original_tf2_checkpoint_to_pytorch.py

LysandreJik

Very clean, thanks for your contribution @stefan-it! Don't forget to fill the model card so that members of the community are aware of how your model was trained 😃

stefan-it · 2022-06-03T09:37:12Z

Hi @LysandreJik

thanks for the approval! I have also added a model card and the model is also mentioned in our new "hmBERT: Historical Multilingual Language Models for Named Entity Recognition" - which is used as the backbone language model for our winning NER models (English and French) :)

stefan-it · 2022-06-27T13:24:00Z

/cc @sgugger 🤗

sgugger · 2022-06-27T17:08:47Z

Sorry this slipped through the cracks! Thanks a lot for your contributrion!

…uggingface#17142) * bert: add conversion script for BERT Token Dropping TF2 checkpoints * bert: rename conversion script for BERT Token Dropping checkpoints * bert: fix flake errors in BERT Token Dropping conversion script * bert: make doc-builder happy!!1!11 * bert: fix pytorch_dump_path of BERT Token Dropping conversion script

stefan-it added 3 commits May 9, 2022 16:04

bert: add conversion script for BERT Token Dropping TF2 checkpoints

9010685

bert: rename conversion script for BERT Token Dropping checkpoints

4ebc430

bert: fix flake errors in BERT Token Dropping conversion script

6e7537f

bert: make doc-builder happy!!1!11

276c5a0

stefan-it commented May 9, 2022

View reviewed changes

src/transformers/models/bert/convert_bert_token_dropping_original_tf2_checkpoint_to_pytorch.py Outdated Show resolved Hide resolved

stefan-it commented May 9, 2022

View reviewed changes

src/transformers/models/bert/convert_bert_token_dropping_original_tf2_checkpoint_to_pytorch.py Outdated Show resolved Hide resolved

bert: fix pytorch_dump_path of BERT Token Dropping conversion script

15b7035

LysandreJik approved these changes May 10, 2022

View reviewed changes

sgugger merged commit 71b2839 into huggingface:main Jun 27, 2022

stefan-it deleted the add-bert-token-dropping-conversion-script branch June 27, 2022 19:09

stefan-it mentioned this pull request Jul 28, 2023

BERT: TensorFlow Model Garden Conversion scripts #25178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert: add conversion script for BERT Token Dropping TF2 checkpoints #17142

bert: add conversion script for BERT Token Dropping TF2 checkpoints #17142

stefan-it commented May 9, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 9, 2022 •

edited

Loading

LysandreJik left a comment

stefan-it commented Jun 3, 2022

stefan-it commented Jun 27, 2022

sgugger commented Jun 27, 2022

bert: add conversion script for BERT Token Dropping TF2 checkpoints #17142

bert: add conversion script for BERT Token Dropping TF2 checkpoints #17142

Conversation

stefan-it commented May 9, 2022 • edited Loading

Conversion Script Usage

Masked LM Predictions

Downstream Task Performance

HuggingFaceDocBuilderDev commented May 9, 2022 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

stefan-it commented Jun 3, 2022

stefan-it commented Jun 27, 2022

sgugger commented Jun 27, 2022

stefan-it commented May 9, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 9, 2022 •

edited

Loading