-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Voice Activity Detection #1468
Comments
Voice Activity Detection has been important for my work TTS and STT. It's useful for segmenting large audio files before training. I'd love a basic VAD implementation akin to:
I don't think there is one perfect algorithm for VAD because you need to make all sorts of assumptions about the SNR ratio. So, I prefer a VAD which can be tuned by hand for different situations: no noise, minimal background noise, medium background noise, etc... and speaker levels: whisper, conversational, speech, etc... The one currently in |
great response @PetrochukM and thanks a lot for the input :) have you also tried this one? |
(quick note: some elements of the algorithm you suggested are also similar to one of the pitch detection algorithm we have) |
Oh. I didn't. Thanks for the tip! It looks pretty close to what I was describing :) Also, another issue that we had with VAD was memory. For some reason, I actually tried to use the pitch detection algorithm in |
@PetrochukM, have you tried the kaldi voice activity detection? also mentioned in pytorch forum |
I considered it but I decided against it during my research. I vaguely remember being overwhelmed by the complexity of it. I hope my POV helps! |
Definitely, thank you :) |
* Show all Learn the Basics content in the Left Nav Testing out the look of creating a separate heading in the left nav for all the learn the basics content. * Collapse Learning PyTorch by default With the addition of the Introduction to PyTorch (Learn the Basics) section on the left nav, we now want to collapse the original Learning PyTorch section.
I'd love to hear from our users what is the impact of their favorite VAD algorithm on their model's performance at both training and inference (offline, online, streaming).
We currently have in torchaudio
cc @mthrok @astaff
The text was updated successfully, but these errors were encountered: