Neural Language Modeling

Pre-trained models

Model	Description	Dataset	Download
`transformer_lm.gbw.adaptive_huge`	Adaptive Inputs (Baevski and Auli, 2018) 1026M params	Google Billion Words	download (.tar.bz2)
`transformer_lm.wiki103.adaptive`	Adaptive Inputs (Baevski and Auli, 2018) 247M params	WikiText-103	download (.tar.bz2)
`transformer_lm.wmt19.en`	English LM (Ng et al., 2019)	WMT News Crawl	download (.tar.gz)
`transformer_lm.wmt19.de`	German LM (Ng et al., 2019)	WMT News Crawl	download (.tar.gz)
`transformer_lm.wmt19.ru`	Russian LM (Ng et al., 2019)	WMT News Crawl	download (.tar.gz)

Example usage

Sampling from a language model using PyTorch Hub:

import torch

# List available models
torch.hub.list('pytorch/fairseq')  # [..., 'transformer_lm.wmt19.en', ...]

# Load an English LM trained on WMT'19 News Crawl data
en_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.en', tokenizer='moses', bpe='fastbpe')

# Sample from the language model
en_lm.sample('Barack Obama', beam=1, sampling=True, sampling_topk=10, temperature=0.8)
# "Barack Obama is coming to Sydney and New Zealand (...)"

Training a transformer language model with the CLI tools

1) Preprocess the data

First download and prepare the WikiText-103 dataset:

cd examples/language_model/
bash prepare-wikitext-103.sh
cd ../..

Next preprocess/binarize the data:

TEXT=examples/language_model/wikitext-103
fairseq-preprocess \
    --only-source \
    --trainpref $TEXT/wiki.train.tokens \
    --validpref $TEXT/wiki.valid.tokens \
    --testpref $TEXT/wiki.test.tokens \ 
    --destdir data-bin/wikitext-103 \
    --workers 20

2) Train a language model

Next we'll train a transformer language model using adaptive inputs:

fairseq-train --task language_modeling \
    data-bin/wikitext-103 \
    --save-dir checkpoints/transformer_wikitext-103 \
    --arch transformer_lm_wiki103 \
    --max-update 286000 --max-lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
    --warmup-updates 16000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 \
    --criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
    --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d

If the above command runs out of memory, try reducing --max-tokens (max number of tokens per batch) or --tokens-per-sample (max sequence length). You can also increase --update-freq to accumulate gradients and simulate training on more GPUs.

3) Evaluate

fairseq-eval-lm data-bin/wikitext-103 \
    --path checkpoints/transformer_wiki103/checkpoint_best.pt \
    --sample-break-mode complete --max-tokens 3072 \
    --context-window 2560 --softmax-batch 1024

Convolutional language models

Please see the convolutional LM README for instructions to train convolutional language models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Neural Language Modeling

Pre-trained models

Example usage

Training a transformer language model with the CLI tools

1) Preprocess the data

2) Train a language model

3) Evaluate

Convolutional language models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Neural Language Modeling

Pre-trained models

Example usage

Training a transformer language model with the CLI tools

1) Preprocess the data

2) Train a language model

3) Evaluate

Convolutional language models