Voice Activity Detection #1468

vincentqb · 2021-04-21T15:25:50Z

I'd love to hear from our users what is the impact of their favorite VAD algorithm on their model's performance at both training and inference (offline, online, streaming).

Is there one we should add in torchaudio?
Does anyone uses webrtc's vad implementation mentioned in comment?

We currently have in torchaudio

cc @mthrok @astaff

PetrochukM · 2021-04-21T15:50:20Z

Voice Activity Detection has been important for my work TTS and STT. It's useful for segmenting large audio files before training. I'd love a basic VAD implementation akin to:
https://maelfabien.github.io/project/Speech_proj/#pros-and-cons

webrtc didn't work for me because it didn't expose enough parameters for tunning. The default version of webrtc didn't work well for clean datasets... It tended to overcorrect for noise. It'd cut off unvoiced consonants because it thought they were noise... I tried all of the available settings in py-webrtcvad to correct for this.

I don't think there is one perfect algorithm for VAD because you need to make all sorts of assumptions about the SNR ratio. So, I prefer a VAD which can be tuned by hand for different situations: no noise, minimal background noise, medium background noise, etc... and speaker levels: whisper, conversational, speech, etc...

The one currently in torchaudio didn't work for me because it is focused on audio trimming instead of detecting voice throughout the audio.

vincentqb · 2021-04-21T16:14:41Z

great response @PetrochukM and thanks a lot for the input :) have you also tried this one?

vincentqb · 2021-04-21T17:05:17Z

(quick note: some elements of the algorithm you suggested are also similar to one of the pitch detection algorithm we have)

PetrochukM · 2021-04-21T19:14:45Z

Oh. I didn't. Thanks for the tip! It looks pretty close to what I was describing :)

Also, another issue that we had with VAD was memory. For some reason, scipy doesn't support memmap. This made it particularly difficult to work with files longer than 1 hour.

I actually tried to use the pitch detection algorithm in torchaudio but it was just too slow compared to other algorithms. I don't have too many details right now but it was much slower than something like loudness detection. I can give more details, in another thread, sometime later!

vincentqb · 2021-06-08T22:04:59Z

@PetrochukM, have you tried the kaldi voice activity detection? also mentioned in pytorch forum

PetrochukM · 2021-06-10T02:58:51Z

I considered it but I decided against it during my research. I vaguely remember being overwhelmed by the complexity of it. I hope my POV helps!

vincentqb · 2021-06-10T14:48:45Z

I hope my POV helps!

Definitely, thank you :)

* Show all Learn the Basics content in the Left Nav Testing out the look of creating a separate heading in the left nav for all the learn the basics content. * Collapse Learning PyTorch by default With the addition of the Introduction to PyTorch (Learn the Basics) section on the left nav, we now want to collapse the original Learning PyTorch section.

vincentqb mentioned this issue Apr 21, 2021

F.vad batch consistency behaviour #1348

Open

astaff mentioned this issue May 18, 2021

Update VAD docstring and check for input shape length #1513

Merged

mthrok closed this as completed Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice Activity Detection #1468

Voice Activity Detection #1468

vincentqb commented Apr 21, 2021

PetrochukM commented Apr 21, 2021 •

edited

Loading

vincentqb commented Apr 21, 2021 •

edited

Loading

vincentqb commented Apr 21, 2021

PetrochukM commented Apr 21, 2021

vincentqb commented Jun 8, 2021

PetrochukM commented Jun 10, 2021 •

edited

Loading

vincentqb commented Jun 10, 2021

Voice Activity Detection #1468

Voice Activity Detection #1468

Comments

vincentqb commented Apr 21, 2021

PetrochukM commented Apr 21, 2021 • edited Loading

vincentqb commented Apr 21, 2021 • edited Loading

vincentqb commented Apr 21, 2021

PetrochukM commented Apr 21, 2021

vincentqb commented Jun 8, 2021

PetrochukM commented Jun 10, 2021 • edited Loading

vincentqb commented Jun 10, 2021

PetrochukM commented Apr 21, 2021 •

edited

Loading

vincentqb commented Apr 21, 2021 •

edited

Loading

PetrochukM commented Jun 10, 2021 •

edited

Loading