You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there an example like Kaldi for implementing online decoding, or loading audio from memory or an IO stream instead of disk? Otherwise, do you have any advice for loading audio from memory or an IO stream using the K2 framework?
The pretrained.py example uses the torchaudio.load function to read audio files from the disk. From my own digging, I think torchaudio does not support reading audio from memory or IO stream.
Background:
I have created a working K2-based backend that reads in Mandarin audio files upon trigger signals and revert with transcripts, using the model and example from the AISHELL conformer-ctc method. Currently, my frontend still needs to save the audio file in .wav. I am exploring the possibility for my front-end to pass the raw wav audio through a socket-client TCP connection directly to my K2-based backend.
The text was updated successfully, but these errors were encountered:
We are not going to work on that for a few months, I think. Right now we are focusing on the core of the online-decoding problem, particularly relating to RNN-T which is inherently easier to adapt to online-decoding than a transformer decoder. After that is done we will consider productization aspects. But I'm not sure when we will open-source the part responsible for ingesting the wav file. (For now it's not an issue as we haven't done it.)
I see. I remember it was mentioned somewhere that online decoding will not be Icefall's priority for a few months, but couldn't find the exact statement to check if "a few months" have passed.
Thank you so much also for being frank about the uncertainty regarding open-source support for online decoding. I guess I will be waiting on this Github for further news/decisions.
Is there an example like Kaldi for implementing online decoding, or loading audio from memory or an IO stream instead of disk? Otherwise, do you have any advice for loading audio from memory or an IO stream using the K2 framework?
The pretrained.py example uses the torchaudio.load function to read audio files from the disk. From my own digging, I think torchaudio does not support reading audio from memory or IO stream.
PyTorch Audio
Background:
I have created a working K2-based backend that reads in Mandarin audio files upon trigger signals and revert with transcripts, using the model and example from the AISHELL conformer-ctc method. Currently, my frontend still needs to save the audio file in .wav. I am exploring the possibility for my front-end to pass the raw wav audio through a socket-client TCP connection directly to my K2-based backend.
The text was updated successfully, but these errors were encountered: