Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++ support for streaming NeMo CTC models. #857

Merged
merged 5 commits into from
May 10, 2024

Conversation

csukuangfj
Copy link
Collaborator

Following #843

@csukuangfj csukuangfj merged commit 46e4e5b into k2-fsa:master May 10, 2024
179 of 199 checks passed
@csukuangfj csukuangfj deleted the nemo-streaming-ctc branch May 10, 2024 08:26
@tempops
Copy link

tempops commented May 15, 2024

Hello, thank you for the speedy response and the model export support! I tried the online-nemo-ctc-decode-files.py with the 480.ms model but the response isn't generated real-time, I assume as it is online the text should be generated as it is decoded.

I also noticed a few errors, is it due to the model itself? and is the 1040ms model better or 80ms model better

I also wanted to know if streaming transducer can be used with the streaming_server.py file, as it has a separate decoder and joiner. I tried using it but got an error:

/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/online-transducer-model.cc:GetModelType:75 Unsupported model_type: EncDecHybridRNNTCTCBPEModel
/Users/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/online-transducer-model.cc:Create:116 Unknown model type in online transducer!
zsh: segmentation fault python3 speech-recognition-from-microphone.py

@csukuangfj
Copy link
Collaborator Author

I tried the online-nemo-ctc-decode-files.py with the 480.ms model but the response isn't generated real-time

It is decoding files. What do you mean by real-time?

I also noticed a few errors, is it due to the model itself?

Could you tell us what errors you have noticed?

and is the 1040ms model better or 80ms model better

What do you mean by better?

I also wanted to know if streaming transducer can be used with the streaming_server.py file,

It has not been implemented yet. Will support it this week.

@tempops
Copy link

tempops commented May 15, 2024

Hello,
Thank you for the quick response.

  1. It is decoding files. What do you mean by real-time?
    By real-time I mean text output in streaming. When I ran the online-nemo-ctc-decode-files.py it only printed the output at end of transcription

  2. Could you tell us what errors you have noticed?
    By Errors I mean incorrect transcription of words, mis-spellings and repeated letters which can lead to a high WER

  3. What do you mean by better?
    When I looked at Nemo documentation about fast conformer (here: link)) I saw that the 80 480 and 1040ms is the cache aware windows for the model to decode (here: link), with regards to this higher cache might lead to better transcriptions, I think but not sure about this. wanted to ask the same to you

It has not been implemented yet. Will support it this week.
Thank you! looking forward to using it!

@csukuangfj
Copy link
Collaborator Author

When I ran the online-nemo-ctc-decode-files.py it only printed the output at end of transcription

That is expected. We are decoding a file and it gives you the result once the file is decoded.

Please refer to our microphone examples and you can change them to support NeMo streaming ctc models and then you can see real-time output as you speak.


I also noticed a few errors, is it due to the model itself?

Yes, I think so.


and is the 1040ms model better or 80ms model better

In terms of accuracy, I think 1040ms is better.

In terms of latency, I think 80ms is better.


By the way, you can try the Android APK for NeMo streaming CTC models at

https://k2-fsa.github.io/sherpa/onnx/android/apk.html

Screenshot 2024-05-15 at 11 32 29

APKs for the non-streaming NeMo CTC models can be found at
https://k2-fsa.github.io/sherpa/onnx/vad/apk-asr.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants