VAD not working with go on a particular wav file (48 kHz, stereo) #536

fear-the-reaper · 2024-09-04T19:49:11Z

fear-the-reaper
Sep 4, 2024

❓ Questions and Help

I've been trying to run silero vad using golang. First I tried it on windows but the onnxruntime was just not working so I switched to wsl where the runtime was working correctly but the vad isn't working correctly. Using the examples in the docs

	log.Println("Working with vad 5.0")

	sd, err := speech.NewDetector(speech.DetectorConfig{
		ModelPath:            "./silero_vad.onnx",
		SampleRate:           16000,
		Threshold:            0.5,
		MinSilenceDurationMs: 10,
		SpeechPadMs:          30,
	})
	if err != nil {
		log.Fatalf("failed to create speech detector: %s", err)
	}

	if len(os.Args) != 2 {
		log.Fatalf("invalid arguments provided: expecting one file path")
	}

	f, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatalf("failed to open sample audio file: %s", err)
	}
	defer f.Close()

	dec := wav.NewDecoder(f)

	if ok := dec.IsValidFile(); !ok {
		log.Fatalf("invalid WAV file")
	}

	buf, err := dec.FullPCMBuffer()
	if err != nil {
		log.Fatalf("failed to get PCM buffer")
	}

	pcmBuf := buf.AsFloat32Buffer()

	segments, err := sd.Detect(pcmBuf.Data)
	if err != nil {
		log.Fatalf("Detect failed: %s", err)
	}

	for _, s := range segments {
		log.Printf("speech starts at %0.2fs", s.SpeechStartAt)
		if s.SpeechEndAt > 0 {
			log.Printf("speech ends at %0.2fs", s.SpeechEndAt)
		}
	}

	err = sd.Destroy()
	if err != nil {
		log.Fatalf("failed to destroy detector: %s", err)
	}

I'm using the v1.19.0 runtime and the v5.0 silero vad model.

Now it works for this example .wav file I found here but it isn't working for any recording I did myself which are like 14 to 30 seconds long. Although those work in python fsr.

If anyone can help me out thanks!

@streamer45 @snakers4

Answered by streamer45

Sep 4, 2024

@fear-the-reaper The file you shared has a couple of issues:

Samplerate is 48kHz. The library assumes 16kHz
It's stereo audio. The library assumes mono.

After fixing the input file with ffmpeg -i test.wav -ar 16000 -ac 1 test_fixed.wav, I can get it to work:

go run ./cmd/main.go test_fixed.wav 
2024/09/04 17:12:47 speech starts at 0.93s
2024/09/04 17:12:47 speech ends at 2.46s
2024/09/04 17:12:47 speech starts at 2.91s
2024/09/04 17:12:47 speech ends at 4.67s
2024/09/04 17:12:47 speech starts at 4.86s
2024/09/04 17:12:47 speech ends at 5.18s
2024/09/04 17:12:47 speech starts at 5.50s
2024/09/04 17:12:47 speech ends at 5.76s
2024/09/04 17:12:47 speech starts at 5.89s
2024/09/04 17:12:47 …

View full answer

fear-the-reaper · 2024-09-04T19:52:04Z

fear-the-reaper
Sep 4, 2024
Author

I even tried decreasing the threshold but that didn't work either also when I try to use another version like 4.0 it gives me this error:
Detect failed: infer failed: failed to run: Invalid input name: state

0 replies

streamer45 · 2024-09-04T22:50:13Z

streamer45
Sep 4, 2024

@fear-the-reaper If you could share a file sample that doesn't work, I could have a better look.

0 replies

fear-the-reaper · 2024-09-04T22:59:24Z

fear-the-reaper
Sep 4, 2024
Author

sure @streamer45 I'll send here's a test.wav drive link since I can't upload .wav here. The link. Ty!

0 replies

fear-the-reaper · 2024-09-04T23:00:45Z

fear-the-reaper
Sep 4, 2024
Author

Also another question can we run silero-vad-go on windows? I tried but it kept giving me errors.

0 replies

streamer45 · 2024-09-04T23:04:07Z

streamer45
Sep 4, 2024

Also another question can we run silero-vad-go on windows? I tried but it kept giving me errors.

Not sure as I don't have a Windows laptop myself so haven't tried. I'd suggest opening an issue at https://github.com/streamer45/silero-vad-go/issues with the failing logs.

0 replies

streamer45 · 2024-09-04T23:13:58Z

streamer45
Sep 4, 2024

@fear-the-reaper The file you shared has a couple of issues:

Samplerate is 48kHz. The library assumes 16kHz
It's stereo audio. The library assumes mono.

After fixing the input file with ffmpeg -i test.wav -ar 16000 -ac 1 test_fixed.wav, I can get it to work:

go run ./cmd/main.go test_fixed.wav 
2024/09/04 17:12:47 speech starts at 0.93s
2024/09/04 17:12:47 speech ends at 2.46s
2024/09/04 17:12:47 speech starts at 2.91s
2024/09/04 17:12:47 speech ends at 4.67s
2024/09/04 17:12:47 speech starts at 4.86s
2024/09/04 17:12:47 speech ends at 5.18s
2024/09/04 17:12:47 speech starts at 5.50s
2024/09/04 17:12:47 speech ends at 5.76s
2024/09/04 17:12:47 speech starts at 5.89s
2024/09/04 17:12:47 speech ends at 6.08s
2024/09/04 17:12:47 speech starts at 6.08s
2024/09/04 17:12:47 speech ends at 7.26s
2024/09/04 17:12:47 speech starts at 7.84s
2024/09/04 17:12:47 speech ends at 8.13s
2024/09/04 17:12:47 speech starts at 8.80s
2024/09/04 17:12:47 speech ends at 9.82s
2024/09/04 17:12:47 speech starts at 9.82s
2024/09/04 17:12:47 speech ends at 10.46s
2024/09/04 17:12:47 speech starts at 10.88s
2024/09/04 17:12:47 speech ends at 14.11s

1 reply

snakers4 Sep 5, 2024
Maintainer

Copied the issue to discussions and marked the answer so that people could find it

fear-the-reaper · 2024-09-04T23:55:34Z

fear-the-reaper
Sep 4, 2024
Author

@streamer45 OH! I feel like an idiot I read that 1000 times. Noob mistake on my part tysm but in python it just works so do they auto convert it?

0 replies

fear-the-reaper · 2024-09-04T23:58:41Z

fear-the-reaper
Sep 4, 2024
Author

@streamer45 I'll add logs to silero-vad-go for windows support. Will close the issue now.

0 replies

snakers4 · 2024-09-05T03:23:45Z

snakers4
Sep 5, 2024
Maintainer

in python it just works so do they auto convert it?

As for various formats, in python we use torchaudio to read files, which converts to mono, normalizes and reads the codecs under the hood with sox or ffmpeg. As for 48 kHz we just do naive under sampling - just take each third sample or just average.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAD not working with go on a particular wav file (48 kHz, stereo) #536

{{title}}

Replies: 9 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

VAD not working with go on a particular wav file (48 kHz, stereo) #536

fear-the-reaper Sep 4, 2024

❓ Questions and Help

Replies: 9 comments · 1 reply

fear-the-reaper Sep 4, 2024 Author

streamer45 Sep 4, 2024

fear-the-reaper Sep 4, 2024 Author

fear-the-reaper Sep 4, 2024 Author

streamer45 Sep 4, 2024

streamer45 Sep 4, 2024

snakers4 Sep 5, 2024 Maintainer

fear-the-reaper Sep 4, 2024 Author

fear-the-reaper Sep 4, 2024 Author

snakers4 Sep 5, 2024 Maintainer

fear-the-reaper
Sep 4, 2024

Replies: 9 comments 1 reply

fear-the-reaper
Sep 4, 2024
Author

streamer45
Sep 4, 2024

fear-the-reaper
Sep 4, 2024
Author

fear-the-reaper
Sep 4, 2024
Author

streamer45
Sep 4, 2024

streamer45
Sep 4, 2024

snakers4 Sep 5, 2024
Maintainer

fear-the-reaper
Sep 4, 2024
Author

fear-the-reaper
Sep 4, 2024
Author

snakers4
Sep 5, 2024
Maintainer