How to handle interruptions better while building speech to speech pipeline? #156

mehul-fabrichq · 2024-12-02T13:33:28Z

Any examples of end to end speech to speech pipeline for better latency and interruption handling?

KoljaB · 2024-12-04T13:21:28Z

That's a pretty general question. Interruption handling is tricky because it often requires echo cancellation of the voice agent's TTS output. Latency, on the other hand, is all about balancing. A fast STT system should transcribe in under 100ms on a strong GPU, and a decent TTS system adds around 200ms. The rest of the delay comes from LLM generation or speech end detection.

Most basic speech endpoint detection methods rely on waiting for a certain amount of silence, which naturally adds latency.

For better latency:

Make sure your STT, LLM, and TTS are as fast as possible. Use a more advanced speech endpoint detection method, like adjusting the silence threshold based on real-time transcription (e.g., detecting end punctuation) or analyzing frequency changes. People often lower their pitch when finishing a thought or raise it for questions.

For interruption handling:

Remove TTS feedback from the input and apply volume-based thresholds afterward.

adhambadr · 2024-12-24T18:27:55Z

first of all this is an insanely well done and robust library. second, any more clues on where to look to 'Remove TTS feedback from the input' ?
Right now im having the issue the generated TTS is being picked up by the mic and fed back into the pipeline of the STT.

KoljaB · 2024-12-24T19:01:12Z

Easy way: mute the mic when TTS is running. Like this:

When TTS starts:

recorder.abort()
recorder.stop()

When TTS stops:

recorder.clear_audio_queue()
recorder.recording_stop_time = 0
recorder.wakeup()

Harder way: add echo cancellation. Haven’t nailed this reliably yet, but would be awesome to see some working code. This would allow you to interrupt the voice agent mid-response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle interruptions better while building speech to speech pipeline? #156

How to handle interruptions better while building speech to speech pipeline? #156

mehul-fabrichq commented Dec 2, 2024

KoljaB commented Dec 4, 2024

adhambadr commented Dec 24, 2024

KoljaB commented Dec 24, 2024

How to handle interruptions better while building speech to speech pipeline? #156

How to handle interruptions better while building speech to speech pipeline? #156

Comments

mehul-fabrichq commented Dec 2, 2024

KoljaB commented Dec 4, 2024

adhambadr commented Dec 24, 2024

KoljaB commented Dec 24, 2024