-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug report - Regression of VAD quality between 3.1 and 4.0 (speech detected on perfect silence) #396
Comments
It is a known issue with near zero signals. The newer VAD tries to suppress spurious activations with subtle speech in the background. Because suppressing some noise in the background and working on perfectly silent audios are mutually exclusive. In any case post-processing hyper-parameters should be tuned for each domain. Maybe a VAD should have a flag. |
Since the noise is really near-zero, could this be prevented by a preprocessing filter cutting long parts below a set threshold, or -20dB below the average RMS? |
To be solved with a V5 release |
Great job! |
Any updates on this? |
Pumped for next version of silero_vad.onnx! |
you're right. I tried compute RMS per frame and set a threshold of RMS*1000 < 10 and it works well. |
Good. With frame you mean the 512/1024/1536 samples? |
The new VAD version was released just now - #2 (comment) It was designed with this issue in mind and performance on edge cases like this was significantly improved - https://github.com/snakers4/silero-vad/wiki/Quality-Metrics Can you please re-run your tests and if the issue persists - please open a new issue |
🐛 Bug
On some audio, the quality of the VAD is reallly worse in the latest version v4.0, compared to what it was in v3.1
More precisely, v4.0 detects speech on "quasi perfect" silent period:
Another user reports the same experience (and spotted v3 to me): linto-ai/whisper-timestamped#74 (comment)
Also, I have troubles to revert to v3.1. See comment here: linto-ai/whisper-timestamped#142 (comment)
Maybe I missed something to handle versioning with silero-vad.
Any help on this PR would be very much appreciated.
To Reproduce
Audio to reproduce : https://github.com/linto-ai/whisper-timestamped/files/11220341/jon.zip
The text was updated successfully, but these errors were encountered: