Unreliable BotStopped/BotStarted frame emission in DeepgramTTSService #453

amacapri · 2024-09-12T05:32:14Z

The DeepgramTTSService often exhibits the following pattern when processing a single LLM response:

BotSpeakingFrame#998
BotSpeakingFrame#999
BotSpeakingFrame#1000
BotStoppedSpeakingFrame#9
BotStartedSpeakingFrame#10
BotSpeakingFrame#1001
BotSpeakingFrame#1002
BotSpeakingFrame#1003

As you can see, while speaking, it emits a BotStoppedSpeakingFrame followed immediately by a BotStartedSpeakingFrame. This is problematic because it makes it difficult to reliably detect when the bot has actually stopped speaking. For example, this behavior does not occur with Cartesia TTS, where the bot speaking frames would instead be emitted as:

BotSpeakingFrame#998
BotSpeakingFrame#999
BotSpeakingFrame#1000
BotSpeakingFrame#1001
BotSpeakingFrame#1002
BotSpeakingFrame#1003

gregoryhermann · 2024-09-19T00:30:34Z

Same is true for Azure TTS. Due to downstream processors, it'd be really helpful to know when the bot's turn is 'over'.
This may be antithetical to some design goals you're executing against, if so, please let us know what we should expect protocol-wise and I can work around it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreliable BotStopped/BotStarted frame emission in DeepgramTTSService #453

Unreliable BotStopped/BotStarted frame emission in DeepgramTTSService #453

amacapri commented Sep 12, 2024

gregoryhermann commented Sep 19, 2024

Unreliable BotStopped/BotStarted frame emission in DeepgramTTSService #453

Unreliable BotStopped/BotStarted frame emission in DeepgramTTSService #453

Comments

amacapri commented Sep 12, 2024

gregoryhermann commented Sep 19, 2024