Add streaming ASR with Emformer RNN-T #6

csukuangfj · 2022-05-30T16:37:54Z

I have tested it and it works. Will upload a pretrained Emformer model later.

Note that the framework is quite general and it is easy to adapt to other kinds of stateless RNN-T models, not limited to Emformer RNN-T models.

pkufool · 2022-05-30T23:06:25Z

Wow, Cool! Once you merge this PR, I will try to add streaming conformer model.

pkufool

LGTM

sherpa/csrc/rnnt_emformer_model.cc

pkufool · 2022-05-30T23:42:56Z

sherpa/csrc/rnnt_beam_search.cc

+      decoder_out = model.ForwardDecoder(decoder_input.to(device)).squeeze(1);
+    }
+  }
+  return decoder_out;


Why do we need to return decoder_out here, the only reason I can think of is to avoid an extra ForwardDecoder, are there any other reasons?

decoder_out is to save an extra op decoder.forward().

For streaming decoding, the input chunk size is fixed and there are no paddings. We can figure out the encoder_out_len from encoder_out.

We can add it for fast_beam_search later if it turns out it is necesseary.

sherpa/bin/decode.py

sherpa/bin/streaming_server.py

sherpa/csrc/rnnt_emformer_model.h

pkufool · 2022-05-31T00:26:39Z

sherpa/csrc/rnnt_model.h

@@ -53,7 +53,7 @@ class RnntModel {
   * @param features  A 3-D tensor of shape (N, T, C).
   * @param features_length A 1-D tensor of shape (N,) containing the number of
   *                       valid frames in `features`.
-   * @return Return a tuple containing two tensors:
+   * @return Return a pair containing two tensors:


I think we will need feature_lens in fast_beam_search. But surely we might add it when needed.

csukuangfj · 2022-06-01T02:12:04Z

Here is a demo for this PR: https://www.youtube.com/watch?v=z7HgaZv5W0U

csukuangfj added 5 commits May 30, 2022 13:17

First working version.

7762d73

First C++ working version.

77cb912

Refactoring.

447fec1

Add streaming ASR with stateless Emformer RNN-T.

a046e16

typo fixes

5710005

pkufool approved these changes May 31, 2022

View reviewed changes

csukuangfj mentioned this pull request May 31, 2022

Add streaming Emformer stateless RNN-T. k2-fsa/icefall#390

Merged

Fix comments.

97b2e3c

csukuangfj added 4 commits June 1, 2022 10:18

Add web interface.

e1350f5

Add CI for streaming ASR.

5818c84

Minor fixes to README.

22db79e

Minor fixes.

94d26b6

csukuangfj merged commit ba865c7 into k2-fsa:master Jun 1, 2022

csukuangfj deleted the streaming-asr branch June 1, 2022 03:03

uni-manjunath-ke mentioned this pull request Mar 13, 2023

Sherpa support for Nemo ctc models via torchscript #303

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streaming ASR with Emformer RNN-T #6

Add streaming ASR with Emformer RNN-T #6

csukuangfj commented May 30, 2022

pkufool commented May 30, 2022

pkufool left a comment

pkufool May 30, 2022

csukuangfj Jun 1, 2022

pkufool May 31, 2022

csukuangfj commented Jun 1, 2022

Add streaming ASR with Emformer RNN-T #6

Add streaming ASR with Emformer RNN-T #6

Conversation

csukuangfj commented May 30, 2022

pkufool commented May 30, 2022

pkufool left a comment

Choose a reason for hiding this comment

pkufool May 30, 2022

Choose a reason for hiding this comment

csukuangfj Jun 1, 2022

Choose a reason for hiding this comment

pkufool May 31, 2022

Choose a reason for hiding this comment

csukuangfj commented Jun 1, 2022