cd tools
make KALDI=/path/to/kaldi TOOL=/path/to/save/tools
-
ASR
- AISHELL-1
- AISHELL-2
- AMI
- CSJ
- LaboroTVSpeech
- Librispeech
- Switchboard (+Fisher)
- TEDLIUM2/TEDLIUM3
- TIMIT
- WSJ
-
LM
- Penn Tree Bank
- WikiText2
- RNN encoder
- Transformer encoder [link]
- Conformer encoder [link]
- Time-depth separable (TDS) convolution encoder [link] [line]
- Gated CNN encoder (GLU) [link]
- Beam search
- Shallow fusion
- Forced alignment
RNN-Transducer (RNN-T) decoder [link]
- Beam search
- Shallow fusion
- RNN decoder
- Attention type
- location-based
- content-based
- dot-product
- GMM attention
- Streaming RNN decoder specific
- Transformer decoder [link]
- Streaming Transformer decoder specific
- RNNLM (recurrent neural network language model)
- Gated convolutional LM [link]
- Transformer LM
- Transformer-XL LM [link]
- Adaptive softmax [link]
- Phoneme
- Grapheme
- Wordpiece (BPE, sentencepiece)
- Word
- Word-char mix
Multi-task learning (MTL) with different units are supported to alleviate data sparseness.
- Hybrid CTC/attention [link]
- Hierarchical Attention (e.g., word attention + character attention) [link]
- Hierarchical CTC (e.g., word CTC + character CTC) [link]
- Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
- Forward-backward attention [link]
- LM objective
Model | dev | test |
---|---|---|
Conformer LAS | 4.1 | 4.5 |
Transformer | 5.0 | 5.4 |
Streaming MMA | 5.5 | 6.1 |
Model | test_android | test_ios | test_mic |
---|---|---|---|
Conformer LAS | 6.1 | 5.5 | 5.9 |
Model | eval1 | eval2 | eval3 |
---|---|---|---|
Conformer LAS | 5.7 | 4.4 | 4.9 |
BLSTM LAS | 6.5 | 5.1 | 5.6 |
LC-BLSTM MoChA | 7.4 | 5.6 | 6.4 |
Model | SWB | CH |
---|---|---|
BLSTM LAS | 9.1 | 18.8 |
Model | SWB | CH |
---|---|---|
BLSTM LAS | 7.8 | 13.8 |
Model | dev_4k | dev | tedx-jp-10k |
---|---|---|---|
Conformer LAS | 7.8 | 10.1 | 12.4 |
Model | dev-clean | dev-other | test-clean | test-other |
---|---|---|---|---|
Conformer LAS | 1.9 | 4.6 | 2.1 | 4.9 |
Transformer | 2.1 | 5.3 | 2.4 | 5.7 |
BLSTM LAS | 2.5 | 7.2 | 2.6 | 7.5 |
BLSTM RNN-T | 2.9 | 8.5 | 3.2 | 9.0 |
UniLSTM RNN-T | 3.7 | 11.7 | 4.0 | 11.6 |
UniLSTM MoChA | 4.1 | 11.0 | 4.2 | 11.2 |
LC-BLSTM RNN-T | 3.3 | 9.8 | 3.5 | 10.2 |
LC-BLSTM MoChA | 3.3 | 8.8 | 3.5 | 9.1 |
Streaming MMA | 2.5 | 6.9 | 2.7 | 7.1 |
Model | dev | test |
---|---|---|
Conformer LAS | 7.0 | 6.8 |
BLSTM LAS | 8.1 | 7.5 |
LC-BLSTM RNN-T | 8.0 | 7.7 |
LC-BLSTM MoChA | 10.3 | 8.6 |
UniLSTM RNN-T | 10.7 | 10.7 |
UniLSTM MoChA | 13.5 | 11.6 |
Model | test_dev93 | test_eval92 |
---|---|---|
BLSTM LAS | 8.8 | 6.2 |
Model | valid | test |
---|---|---|
RNNLM | 87.99 | 86.06 |
+ cache=100 | 79.58 | 79.12 |
+ cache=500 | 77.36 | 76.94 |
Model | valid | test |
---|---|---|
RNNLM | 104.53 | 98.73 |
+ cache=100 | 90.86 | 85.87 |
+ cache=2000 | 76.10 | 72.77 |