VAD not working with go on a particular wav file (48 kHz, stereo) #536
-
❓ Questions and HelpI've been trying to run silero vad using golang. First I tried it on windows but the onnxruntime was just not working so I switched to wsl where the runtime was working correctly but the vad isn't working correctly. Using the examples in the docs log.Println("Working with vad 5.0")
sd, err := speech.NewDetector(speech.DetectorConfig{
ModelPath: "./silero_vad.onnx",
SampleRate: 16000,
Threshold: 0.5,
MinSilenceDurationMs: 10,
SpeechPadMs: 30,
})
if err != nil {
log.Fatalf("failed to create speech detector: %s", err)
}
if len(os.Args) != 2 {
log.Fatalf("invalid arguments provided: expecting one file path")
}
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatalf("failed to open sample audio file: %s", err)
}
defer f.Close()
dec := wav.NewDecoder(f)
if ok := dec.IsValidFile(); !ok {
log.Fatalf("invalid WAV file")
}
buf, err := dec.FullPCMBuffer()
if err != nil {
log.Fatalf("failed to get PCM buffer")
}
pcmBuf := buf.AsFloat32Buffer()
segments, err := sd.Detect(pcmBuf.Data)
if err != nil {
log.Fatalf("Detect failed: %s", err)
}
for _, s := range segments {
log.Printf("speech starts at %0.2fs", s.SpeechStartAt)
if s.SpeechEndAt > 0 {
log.Printf("speech ends at %0.2fs", s.SpeechEndAt)
}
}
err = sd.Destroy()
if err != nil {
log.Fatalf("failed to destroy detector: %s", err)
} I'm using the v1.19.0 runtime and the v5.0 silero vad model. Now it works for this example .wav file I found here but it isn't working for any recording I did myself which are like 14 to 30 seconds long. Although those work in python fsr. If anyone can help me out thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 1 reply
-
I even tried decreasing the threshold but that didn't work either also when I try to use another version like 4.0 it gives me this error: |
Beta Was this translation helpful? Give feedback.
-
@fear-the-reaper If you could share a file sample that doesn't work, I could have a better look. |
Beta Was this translation helpful? Give feedback.
-
sure @streamer45 I'll send here's a |
Beta Was this translation helpful? Give feedback.
-
Also another question can we run |
Beta Was this translation helpful? Give feedback.
-
Not sure as I don't have a Windows laptop myself so haven't tried. I'd suggest opening an issue at https://github.com/streamer45/silero-vad-go/issues with the failing logs. |
Beta Was this translation helpful? Give feedback.
-
@fear-the-reaper The file you shared has a couple of issues:
After fixing the input file with go run ./cmd/main.go test_fixed.wav
2024/09/04 17:12:47 speech starts at 0.93s
2024/09/04 17:12:47 speech ends at 2.46s
2024/09/04 17:12:47 speech starts at 2.91s
2024/09/04 17:12:47 speech ends at 4.67s
2024/09/04 17:12:47 speech starts at 4.86s
2024/09/04 17:12:47 speech ends at 5.18s
2024/09/04 17:12:47 speech starts at 5.50s
2024/09/04 17:12:47 speech ends at 5.76s
2024/09/04 17:12:47 speech starts at 5.89s
2024/09/04 17:12:47 speech ends at 6.08s
2024/09/04 17:12:47 speech starts at 6.08s
2024/09/04 17:12:47 speech ends at 7.26s
2024/09/04 17:12:47 speech starts at 7.84s
2024/09/04 17:12:47 speech ends at 8.13s
2024/09/04 17:12:47 speech starts at 8.80s
2024/09/04 17:12:47 speech ends at 9.82s
2024/09/04 17:12:47 speech starts at 9.82s
2024/09/04 17:12:47 speech ends at 10.46s
2024/09/04 17:12:47 speech starts at 10.88s
2024/09/04 17:12:47 speech ends at 14.11s |
Beta Was this translation helpful? Give feedback.
-
@streamer45 OH! I feel like an idiot I read that 1000 times. Noob mistake on my part tysm but in python it just works so do they auto convert it? |
Beta Was this translation helpful? Give feedback.
-
@streamer45 I'll add logs to silero-vad-go for windows support. Will close the issue now. |
Beta Was this translation helpful? Give feedback.
-
As for various formats, in python we use torchaudio to read files, which converts to mono, normalizes and reads the codecs under the hood with sox or ffmpeg. As for 48 kHz we just do naive under sampling - just take each third sample or just average. |
Beta Was this translation helpful? Give feedback.
@fear-the-reaper The file you shared has a couple of issues:
After fixing the input file with
ffmpeg -i test.wav -ar 16000 -ac 1 test_fixed.wav
, I can get it to work: