Skip to content

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

License

Notifications You must be signed in to change notification settings

tristan-kiersarsky/Graformer

 
 

Repository files navigation

Graformer

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer (also named BridgeTransformer in the code) is a sequence-to-sequence model mainly for Neural Machine Translation. We improve the multilingual translation by taking advantage of pre-trained (masked) language models, including pre-trained encoder (BERT) and pre-trained decoder (GPT). The code is based on Fairseq.

Examples

You can start with run/run.sh, with some minor modification. The corresponding scripts represent:

train a pre-trained BERT:
    run_arnold_multilingual_masked_lm_6e6d.sh

train a pre-trained GPT:
    run_arnold_multilingual_lm_6e6d.sh

train a Graformer:
    run_arnold_multilingual_graft_transformer_12e12d_ted.sh

inference from Graformer:
    run_arnold_multilingual_graft_inference_ted.sh
    

Released Models

We release our pre-trained mBERT and mGPT, along with the trained Graformer model in here.

Tensorflow Version

We will provide the tensorflow version in Neurst, a popular toolkit for sequence processing.

Citation

Please cite as:

@inproceedings{sun2021mulilingual,
    title = "Multilingual Translation via Grafting Pre-trained Language Models",
    author = "Sun, Zewei and Wang, Mingxuan and Li, Lei",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings",
    year = "2021"
}

Contact

If you have any questions, please feel free to contact me: [email protected]

About

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.2%
  • Cuda 1.4%
  • Shell 1.0%
  • C++ 0.6%
  • Cython 0.5%
  • Lua 0.2%
  • Jupyter Notebook 0.1%