-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX #844
Conversation
Hi, $ python show-onnx-transudcer.py
=========encoder==========
NodeArg(name='audio_signal', type='tensor(float)', shape=['audio_signal_dynamic_axes_1', 80, 'audio_signal_dynamic_axes_2'])
NodeArg(name='length', type='tensor(int64)', shape=['length_dynamic_axes_1'])
-----
NodeArg(name='outputs', type='tensor(float)', shape=['outputs_dynamic_axes_1', 512, 'outputs_dynamic_axes_2'])
NodeArg(name='encoded_lengths', type='tensor(int64)', shape=['encoded_lengths_dynamic_axes_1'])
=========decoder==========
NodeArg(name='targets', type='tensor(int32)', shape=['targets_dynamic_axes_1', 'targets_dynamic_axes_2'])
NodeArg(name='target_length', type='tensor(int32)', shape=['target_length_dynamic_axes_1'])
NodeArg(name='states.1', type='tensor(float)', shape=[1, 'states.1_dim_1', 640])
NodeArg(name='onnx::LSTM_3', type='tensor(float)', shape=[1, 1, 640])
-----
NodeArg(name='outputs', type='tensor(float)', shape=['outputs_dynamic_axes_1', 640, 'outputs_dynamic_axes_2'])
NodeArg(name='prednet_lengths', type='tensor(int32)', shape=['prednet_lengths_dynamic_axes_1'])
NodeArg(name='states', type='tensor(float)', shape=[1, 'states_dynamic_axes_1', 640])
NodeArg(name='74', type='tensor(float)', shape=[1, 'LSTM74_dim_1', 640])
=========joiner==========
NodeArg(name='encoder_outputs', type='tensor(float)', shape=['encoder_outputs_dynamic_axes_1', 512, 'encoder_outputs_dynamic_axes_2'])
NodeArg(name='decoder_outputs', type='tensor(float)', shape=['decoder_outputs_dynamic_axes_1', 640, 'decoder_outputs_dynamic_axes_2'])
-----
NodeArg(name='outputs', type='tensor(float)', shape=['outputs_dynamic_axes_1', 'outputs_dynamic_axes_2', 'outputs_dynamic_axes_3', 1025])
I am missing these meta-data "cache_last_channel_dim1": cache_last_channel_dim1,
"cache_last_channel_dim2": cache_last_channel_dim2,
"cache_last_channel_dim3": cache_last_channel_dim3,
"cache_last_time_dim1": cache_last_time_dim1,
"cache_last_time_dim2": cache_last_time_dim2,
"cache_last_time_dim3": cache_last_time_dim3, What could be the reason? Also, why do we need these information in the meta-data? Are these necessary and important to be supported by sherpa-decoder? thank you |
I just added the streaming CTC support for NeMo Hybrid fast conformer transducer+ctc. I think you can use it as a reference to add the streaming transducer support.
You can find their usage at sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc Lines 229 to 232 in 46e4e5b
sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc Lines 239 to 240 in 46e4e5b
Make sure you have followed sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/export-onnx-transducer.py Line 88 in 46e4e5b
You have to use
in order to export a streaming model. |
thats so detailed explanation. Thank you so much @csukuangfj . I think for using offline decoding I had set- |
Hello! @sangeet2020 @csukuangfj Thanks! |
Yes, you can. Please follow how we use streaming transducers in sherpa-onnx. All you need to do is to use the model filenames for fast conformer transducers. |
Following #843
This PR handles the transducer part.
CC @tempops @sangeet2020
Also CC @titu1994
NeMo fuses the decoder + joiner into a single model
decoder_joint
.The disadvantage of the fusion is that it increases the computation overhead during decoding.
(I don't see any benefits of the fusion.)
This PR instead exports the decoder and joiner separately.