Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch in fbank computation parameters causing deletion errors #514

Closed
ipmedenn opened this issue Jan 3, 2024 · 5 comments
Closed

Mismatch in fbank computation parameters causing deletion errors #514

ipmedenn opened this issue Jan 3, 2024 · 5 comments

Comments

@ipmedenn
Copy link

ipmedenn commented Jan 3, 2024

Hello,

I found that there is a mismatch in fbank computation parameters.
Icefall models are trained with parameter high_freq = -400 (class FbankConfig in lhotse/features/kaldi/extractors.py), but in kaldi-native-fbank(sherpa-onnx) default high_freq value is 0.
In my experiments, for some audio files this causes a large amount of deletion errors in sherpa-onnx.
Also, similar mismatch in kaldifeat affects zipformer/streaming_decode.py in icefall.

@csukuangfj
Copy link
Collaborator

Thanks for pointing it out! We have never realized the mismatch.

I just checked the code and find that lhotse is using -400 as the default value for high_freq.

Could you help make PRs to sherpa to hardcode the value of high_freq to -400?

@nshmyrev
Copy link
Contributor

nshmyrev commented Jan 4, 2024

Hi @ipmedenn

This is a great catch, thank you! I am actually fighting for some time with high number of deletions in streaming longform recognition too.

Do you see that the proposed change completely fixed the problem? I quickly tested with -400, actually I don't see much difference, it sometimes gets even worse. Maybe problem is somewhere else too. Do you see the deletions issue in offline mode or in streaming mode?

One thing I noticed that it helps to reset encoder state, but not well tested

alphacep@d6ae830

@csukuangfj
Copy link
Collaborator

csukuangfj commented Jan 4, 2024

csukuangfj added a commit to k2-fsa/sherpa-ncnn that referenced this issue Jan 4, 2024
* Use high_freq -400 in computing fbank features.

See also k2-fsa/sherpa-onnx#514

* Release v2.1.5
csukuangfj added a commit to k2-fsa/icefall that referenced this issue Jan 4, 2024
@ipmedenn
Copy link
Author

ipmedenn commented Jan 4, 2024

Hi @ipmedenn

This is a great catch, thank you! I am actually fighting for some time with high number of deletions in streaming longform recognition too.

Do you see that the proposed change completely fixed the problem? I quickly tested with -400, actually I don't see much difference, it sometimes gets even worse. Maybe problem is somewhere else too. Do you see the deletions issue in offline mode or in streaming mode?

One thing I noticed that it helps to reset encoder state, but not well tested

Hi @nshmyrev,
I see deletions issue in streaming mode, and -400 completely solves this issue.

@nshmyrev
Copy link
Contributor

nshmyrev commented Jan 4, 2024

@ipmedenn Great, thanks for the information!

rouseabout added a commit to rouseabout/knf that referenced this issue Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants