Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX #844

csukuangfj · 2024-05-08T09:18:12Z

Following #843

This PR handles the transducer part.

CC @tempops @sangeet2020

Also CC @titu1994

NeMo fuses the decoder + joiner into a single model decoder_joint.

The disadvantage of the fusion is that it increases the computation overhead during decoding.

(I don't see any benefits of the fusion.)

This PR instead exports the decoder and joiner separately.

sangeet2020 · 2024-05-10T08:42:06Z

Hi,
Thanks for the PR.
I tried adding the meta-data and then exporting the model using your script. However, they dont look as expected..I mean the encoder part.

$ python show-onnx-transudcer.py 
=========encoder==========
NodeArg(name='audio_signal', type='tensor(float)', shape=['audio_signal_dynamic_axes_1', 80, 'audio_signal_dynamic_axes_2'])
NodeArg(name='length', type='tensor(int64)', shape=['length_dynamic_axes_1'])
-----
NodeArg(name='outputs', type='tensor(float)', shape=['outputs_dynamic_axes_1', 512, 'outputs_dynamic_axes_2'])
NodeArg(name='encoded_lengths', type='tensor(int64)', shape=['encoded_lengths_dynamic_axes_1'])
=========decoder==========
NodeArg(name='targets', type='tensor(int32)', shape=['targets_dynamic_axes_1', 'targets_dynamic_axes_2'])
NodeArg(name='target_length', type='tensor(int32)', shape=['target_length_dynamic_axes_1'])
NodeArg(name='states.1', type='tensor(float)', shape=[1, 'states.1_dim_1', 640])
NodeArg(name='onnx::LSTM_3', type='tensor(float)', shape=[1, 1, 640])
-----
NodeArg(name='outputs', type='tensor(float)', shape=['outputs_dynamic_axes_1', 640, 'outputs_dynamic_axes_2'])
NodeArg(name='prednet_lengths', type='tensor(int32)', shape=['prednet_lengths_dynamic_axes_1'])
NodeArg(name='states', type='tensor(float)', shape=[1, 'states_dynamic_axes_1', 640])
NodeArg(name='74', type='tensor(float)', shape=[1, 'LSTM74_dim_1', 640])
=========joiner==========
NodeArg(name='encoder_outputs', type='tensor(float)', shape=['encoder_outputs_dynamic_axes_1', 512, 'encoder_outputs_dynamic_axes_2'])
NodeArg(name='decoder_outputs', type='tensor(float)', shape=['decoder_outputs_dynamic_axes_1', 640, 'decoder_outputs_dynamic_axes_2'])
-----
NodeArg(name='outputs', type='tensor(float)', shape=['outputs_dynamic_axes_1', 'outputs_dynamic_axes_2', 'outputs_dynamic_axes_3', 1025])

I am missing these meta-data

        "cache_last_channel_dim1": cache_last_channel_dim1,
        "cache_last_channel_dim2": cache_last_channel_dim2,
        "cache_last_channel_dim3": cache_last_channel_dim3,
        "cache_last_time_dim1": cache_last_time_dim1,
        "cache_last_time_dim2": cache_last_time_dim2,
        "cache_last_time_dim3": cache_last_time_dim3,

What could be the reason? Also, why do we need these information in the meta-data? Are these necessary and important to be supported by sherpa-decoder?

thank you

csukuangfj · 2024-05-10T08:47:11Z

I just added the streaming CTC support for NeMo Hybrid fast conformer transducer+ctc.
Please see #857

I think you can use it as a reference to add the streaming transducer support.

I am missing these meta-data

You can find their usage at

sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc

Lines 229 to 232 in 46e4e5b

    
           std::array<int64_t, 4> cache_last_channel_shape{1, cache_last_channel_dim1_, 
        
                                                           cache_last_channel_dim2_, 
        
                                                           cache_last_channel_dim3_};

sherpa-onnx/sherpa-onnx/csrc/online-nemo-ctc-model.cc

Lines 239 to 240 in 46e4e5b

    
           std::array<int64_t, 4> cache_last_time_shape{ 
        
               1, cache_last_time_dim1_, cache_last_time_dim2_, cache_last_time_dim3_};

I tried adding the meta-data and then exporting the model using your script.

Make sure you have followed

sherpa-onnx/scripts/nemo/fast-conformer-hybrid-transducer-ctc/export-onnx-transducer.py

Line 88 in 46e4e5b

asr_model.set_export_config({"decoder_type": "rnnt", "cache_support": True})

You have to use

"cache_support": True

in order to export a streaming model.

csukuangfj · 2024-05-10T08:51:11Z

In case you have any confusions, please see the above screenshot for scripts about exporting streaming and non-streaming models.

sangeet2020 · 2024-05-10T08:54:47Z

thats so detailed explanation. Thank you so much @csukuangfj .

I think for using offline decoding I had set- "cache_support": False, that was the problem. But setting it to true solved the problem.
thanks again!

FawazCL · 2024-09-17T17:41:50Z

Hello! @sangeet2020 @csukuangfj
I wanted to know if the FastConformer model scripts are available for microphone inference? I see only inference for online from file.

Thanks!

csukuangfj · 2024-09-18T01:52:59Z

Hello! @sangeet2020 @csukuangfj I wanted to know if the FastConformer model scripts are available for microphone inference? I see only inference for online from file.

Thanks!

Yes, you can.

Please follow how we use streaming transducers in sherpa-onnx.

All you need to do is to use the model filenames for fast conformer transducers.

csukuangfj added 4 commits May 8, 2024 08:52

export NeMo fast conformer transducer

513e76b

Add CI

b55f173

export decoder and joiner separately

b81dfdd

fix style issues

5d8a6a8

csukuangfj merged commit 68b25ab into k2-fsa:master May 8, 2024

csukuangfj deleted the nemo-transducer branch May 8, 2024 11:07

csukuangfj mentioned this pull request May 8, 2024

Help for exporting FastConformer NeMo model to onnx for use in sherpa-onnx for streaming inference #790

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX #844

Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX #844

csukuangfj commented May 8, 2024 •

edited

Loading

sangeet2020 commented May 10, 2024

csukuangfj commented May 10, 2024

csukuangfj commented May 10, 2024

sangeet2020 commented May 10, 2024

FawazCL commented Sep 17, 2024 •

edited

Loading

csukuangfj commented Sep 18, 2024

Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX #844

Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX #844

Conversation

csukuangfj commented May 8, 2024 • edited Loading

sangeet2020 commented May 10, 2024

csukuangfj commented May 10, 2024

csukuangfj commented May 10, 2024

sangeet2020 commented May 10, 2024

FawazCL commented Sep 17, 2024 • edited Loading

csukuangfj commented Sep 18, 2024

csukuangfj commented May 8, 2024 •

edited

Loading

FawazCL commented Sep 17, 2024 •

edited

Loading