-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch in fbank computation parameters causing deletion errors #514
Comments
Thanks for pointing it out! We have never realized the mismatch. I just checked the code and find that lhotse is using -400 as the default value for high_freq. Could you help make PRs to sherpa to hardcode the value of high_freq to -400? |
Hi @ipmedenn This is a great catch, thank you! I am actually fighting for some time with high number of deletions in streaming longform recognition too. Do you see that the proposed change completely fixed the problem? I quickly tested with -400, actually I don't see much difference, it sometimes gets even worse. Maybe problem is somewhere else too. Do you see the deletions issue in offline mode or in streaming mode? One thing I noticed that it helps to reset encoder state, but not well tested |
* Use high_freq -400 in computing fbank features. See also k2-fsa/sherpa-onnx#514 * Release v2.1.5
Hi @nshmyrev, |
@ipmedenn Great, thanks for the information! |
Hello,
I found that there is a mismatch in fbank computation parameters.
Icefall models are trained with parameter high_freq = -400 (class FbankConfig in lhotse/features/kaldi/extractors.py), but in kaldi-native-fbank(sherpa-onnx) default high_freq value is 0.
In my experiments, for some audio files this causes a large amount of deletion errors in sherpa-onnx.
Also, similar mismatch in kaldifeat affects zipformer/streaming_decode.py in icefall.
The text was updated successfully, but these errors were encountered: