Skip to content

Latest commit

 

History

History
413 lines (273 loc) · 16.3 KB

README.md

File metadata and controls

413 lines (273 loc) · 16.3 KB

Introduction

icefall contains ASR recipes for various datasets using https://github.com/k2-fsa/k2.

You can use https://github.com/k2-fsa/sherpa to deploy models trained with icefall.

You can try pre-trained models from within your browser without the need to download or install anything by visiting https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition See https://k2-fsa.github.io/icefall/huggingface/spaces.html for more details.

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide the following recipes:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe: Open In Colab

LibriSpeech

Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.

We provide 5 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

test-clean test-other
WER 2.42 5.73

We provide a Colab notebook to run a pre-trained conformer CTC model: Open In Colab

TDNN LSTM CTC Model

The WER for this model is:

test-clean test-other
WER 6.59 17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: Open In Colab

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

test-clean test-other
WER 3.07 7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model: Open In Colab

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

test-clean test-other
WER 2.56 6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model: Open In Colab

k2 pruned RNN-T

Encoder Params test-clean test-other epochs devices
zipformer 65.5M 2.21 4.79 50 4 32G-V100
zipformer-small 23.2M 2.42 5.73 50 2 32G-V100
zipformer-large 148.4M 2.06 4.63 50 4 32G-V100
zipformer-large 148.4M 2.00 4.38 174 8 80G-A100

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech

test-clean test-other
WER 1.78 4.08

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech + CommonVoice

test-clean test-other
WER 1.90 3.98

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

GigaSpeech

We provide three models for this recipe:

Conformer CTC

Dev Test
WER 10.47 10.58

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

Dev Test
greedy search 10.51 10.73
fast beam search 10.50 10.69
modified beam search 10.40 10.51

Transducer: Zipformer encoder + Embedding decoder

Dev Test
greedy search 10.31 10.50
fast beam search 10.26 10.48
modified beam search 10.25 10.38

Aishell

We provide three models for this recipe: conformer CTC model, TDNN LSTM CTC model, and Transducer Stateless Model,

Conformer CTC Model

The best CER we currently have is:

test
CER 4.26

TDNN LSTM CTC Model

The CER for this model is:

test
CER 10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: Open In Colab

Transducer Stateless Model

The best CER we currently have is:

test
CER 4.38

We provide a Colab notebook to run a pre-trained TransducerStateless model: Open In Colab

Aishell2

We provide one model for this recipe: Transducer Stateless Model.

Transducer Stateless Model

The best WER we currently have is:

dev-ios test-ios
WER 5.32 5.56

Aishell4

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)

The best CER we currently have is:

test
CER 29.08

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

TEST
PER 19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: Open In Colab

TDNN LiGRU CTC Model

The PER for this model is:

TEST
PER 17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: Open In Colab

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

dev test
WER 6.91 6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model: Open In Colab

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

dev test
WER 6.77 6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

Aidatatang_200zh

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

Dev Test
greedy search 5.53 6.59
fast beam search 5.30 6.34
modified beam search 5.27 6.33

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

WenetSpeech

We provide some models for this recipe: Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss and Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)

Dev Test-Net Test-Meeting
greedy search 7.80 8.75 13.49
modified beam search 7.76 8.71 13.41
fast beam search 7.94 8.74 13.80

Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

Streaming:

Dev Test-Net Test-Meeting
greedy_search 8.78 10.12 16.16
modified_beam_search 8.53 9.95 15.81
fast_beam_search 9.01 10.47 16.28

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model: Open In Colab

Alimeeting

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)

Eval Test-Net
greedy search 31.77 34.66
fast beam search 31.39 33.02
modified beam search 30.38 34.25

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

TAL_CSASR

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):

decoding-method dev dev_zh dev_en test test_zh test_en
greedy_search 7.30 6.48 19.19 7.39 6.66 19.13
modified_beam_search 7.15 6.35 18.95 7.22 6.50 18.70
fast_beam_search 7.18 6.39 18.90 7.27 6.55 18.77

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: Open In Colab