Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

TLDR; The authors train a word-level NMT where UNK tokens in both source and target sentence are replaced by character-level RNNs that produce word representations. The authors can thus train a fast word-based system that still generalized that doesn't produce unknown words. The best system achieves a new state of the art BLEU score of 19.9 in WMT'15 English to Czech translation.

Key Points

Source Sentence: Final hidden state of character-RNN is used as word representation.
Source Sentence: Character RNNs always initialized with 0 state to allow efficient pre-training
Target: Produce word-level sentence including UNK first and then run the char-RNNs
Target: Two ways to initialize char-RNN: With same hidden state as word-RNN (same-path), or with its own representation (separate-path)
Authors find that attention mechanism is critical for pure character-based NMT models

Notes

Given that the authors demonstrate the potential of character-based models, is the hybrid approach the right direction? If we had more compute power, would pure character-based models win?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open-vocab-nmt-hybrid-word-character.md

open-vocab-nmt-hybrid-word-character.md

Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

Key Points

Notes

Files

open-vocab-nmt-hybrid-word-character.md

Latest commit

History

open-vocab-nmt-hybrid-word-character.md

File metadata and controls

Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

Key Points

Notes