Getting inconsistent outputs on passing multiple audio chunks parallelly #358
-
Problem: I'm using async websocket connections to pass audio chunks directly to silero VAD and then to ASR service after detecting speech from the VAD model. It's working fine on single client connection. However, if I try to increase concurrent websocket clients to let's say 4, I'm getting gibberish transcriptions from the ASR service. My Hunch: I'm loading the Code to load the model,
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Hi, Looks like the explanation is simple in this case. The VAD, when detecting speech in a streaming fashion, keeps its internal state. A workaround for threads / sockets / workers may be as follows. You can store it, and pass it back to a unified worker, like we basically do in an ONNX example. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Hi,
Looks like the explanation is simple in this case.
Each instance should have its own instance of VAD, since the VAD is not stateless.
Hence it has a
reset_states()
method.The VAD, when detecting speech in a streaming fashion, keeps its internal state.
There are ways to invoke the VAD in a batched fashion while keeping the state, but judging by the user feedback, we discourage such use case for its sheer complexity and errors it causes.
A workaround for threads / sockets / workers may be as follows. You can store it, and pass it back to a unified worker, like we basically do in an ONNX example.