Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming ASR with Emformer RNN-T #6

Merged
merged 10 commits into from
Jun 1, 2022

Conversation

csukuangfj
Copy link
Collaborator

I have tested it and it works. Will upload a pretrained Emformer model later.

Note that the framework is quite general and it is easy to adapt to other kinds of stateless RNN-T models, not limited to Emformer RNN-T models.

@pkufool
Copy link
Collaborator

pkufool commented May 30, 2022

Wow, Cool! Once you merge this PR, I will try to add streaming conformer model.

Copy link
Collaborator

@pkufool pkufool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sherpa/csrc/rnnt_emformer_model.cc Outdated Show resolved Hide resolved
decoder_out = model.ForwardDecoder(decoder_input.to(device)).squeeze(1);
}
}
return decoder_out;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to return decoder_out here, the only reason I can think of is to avoid an extra ForwardDecoder, are there any other reasons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder_out is to save an extra op decoder.forward().

For streaming decoding, the input chunk size is fixed and there are no paddings. We can figure out the encoder_out_len from encoder_out.

We can add it for fast_beam_search later if it turns out it is necesseary.

sherpa/bin/decode.py Outdated Show resolved Hide resolved
sherpa/bin/streaming_server.py Outdated Show resolved Hide resolved
sherpa/csrc/rnnt_emformer_model.h Outdated Show resolved Hide resolved
@@ -53,7 +53,7 @@ class RnntModel {
* @param features A 3-D tensor of shape (N, T, C).
* @param features_length A 1-D tensor of shape (N,) containing the number of
* valid frames in `features`.
* @return Return a tuple containing two tensors:
* @return Return a pair containing two tensors:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need feature_lens in fast_beam_search. But surely we might add it when needed.

@csukuangfj
Copy link
Collaborator Author

Here is a demo for this PR: https://www.youtube.com/watch?v=z7HgaZv5W0U

Watch the video

@csukuangfj csukuangfj merged commit ba865c7 into k2-fsa:master Jun 1, 2022
@csukuangfj csukuangfj deleted the streaming-asr branch June 1, 2022 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants