Getting inconsistent outputs on passing multiple audio chunks parallelly #358

thakurudit · 2023-07-24T09:58:31Z

thakurudit
Jul 24, 2023

Problem: I'm using async websocket connections to pass audio chunks directly to silero VAD and then to ASR service after detecting speech from the VAD model. It's working fine on single client connection. However, if I try to increase concurrent websocket clients to let's say 4, I'm getting gibberish transcriptions from the ASR service.

My Hunch: I'm loading the vad_model only once and it's getting called multiple times across different running threads. Could this be a problem of VAD not supporting multiple concurrent threads resulting in wrong speech confidence scores?

Code to load the model,

vad_model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                  model='silero_vad',
                                  onnx=False)

Answered by snakers4

Jul 24, 2023

Hi,

Looks like the explanation is simple in this case.
Each instance should have its own instance of VAD, since the VAD is not stateless.
Hence it has a reset_states() method.

The VAD, when detecting speech in a streaming fashion, keeps its internal state.
There are ways to invoke the VAD in a batched fashion while keeping the state, but judging by the user feedback, we discourage such use case for its sheer complexity and errors it causes.

A workaround for threads / sockets / workers may be as follows. You can store it, and pass it back to a unified worker, like we basically do in an ONNX example.

View full answer

snakers4 · 2023-07-24T10:03:05Z

snakers4
Jul 24, 2023
Maintainer

Hi,

Looks like the explanation is simple in this case.
Each instance should have its own instance of VAD, since the VAD is not stateless.
Hence it has a reset_states() method.

The VAD, when detecting speech in a streaming fashion, keeps its internal state.
There are ways to invoke the VAD in a batched fashion while keeping the state, but judging by the user feedback, we discourage such use case for its sheer complexity and errors it causes.

A workaround for threads / sockets / workers may be as follows. You can store it, and pass it back to a unified worker, like we basically do in an ONNX example.

5 replies

thakurudit Jul 24, 2023
Author

Thanks!
Will try this out.

Also, I have noticed that the model is quite stochastic in nature. Is there a way to get consistent outputs if not deterministic for same piece of audio passed as input?

snakers4 Jul 24, 2023
Maintainer

If keeping the state, the model produces the same outputs for the same audio on different runs.

thakurudit Jul 24, 2023
Author

Hey @snakers4,
Thanks for the suggestions. I'm getting the desired results for parallel execution.

Before closing the discussion, I just wanted to ask what do you mean by reset_states()?

Does it mean storing/caching the previous audio chunks confidence scores and then using them for future inputs?

snakers4 Jul 24, 2023
Maintainer

No, the jit model has a reset_state() method (please see the utils). As for ONNX model, which is more simple, we do not reset them, but just "drag" the previous state along manuall.

anthonycortinovis Oct 5, 2023

@thakurudit Do you pass the chunks directly as bytes to silero or do save it as temp file to get vad?

snakers4 · 2023-07-24T13:19:21Z

snakers4
Jul 24, 2023
Maintainer

No, the jit model has a reset_state() method (please see the utils). As for ONNX model, which is more simple, we do not reset them, but just "drag" the previous state along manually.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting inconsistent outputs on passing multiple audio chunks parallelly #358

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Getting inconsistent outputs on passing multiple audio chunks parallelly #358

thakurudit Jul 24, 2023

Replies: 2 comments · 5 replies

snakers4 Jul 24, 2023 Maintainer

thakurudit Jul 24, 2023 Author

snakers4 Jul 24, 2023 Maintainer

thakurudit Jul 24, 2023 Author

snakers4 Jul 24, 2023 Maintainer

anthonycortinovis Oct 5, 2023

snakers4 Jul 24, 2023 Maintainer

thakurudit
Jul 24, 2023

Replies: 2 comments 5 replies

snakers4
Jul 24, 2023
Maintainer

thakurudit Jul 24, 2023
Author

snakers4 Jul 24, 2023
Maintainer

thakurudit Jul 24, 2023
Author

snakers4 Jul 24, 2023
Maintainer

snakers4
Jul 24, 2023
Maintainer