Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable BotStopped/BotStarted frame emission in DeepgramTTSService #453

Open
amacapri opened this issue Sep 12, 2024 · 1 comment
Open

Comments

@amacapri
Copy link

The DeepgramTTSService often exhibits the following pattern when processing a single LLM response:

BotSpeakingFrame#998
BotSpeakingFrame#999
BotSpeakingFrame#1000
BotStoppedSpeakingFrame#9
BotStartedSpeakingFrame#10
BotSpeakingFrame#1001
BotSpeakingFrame#1002
BotSpeakingFrame#1003

As you can see, while speaking, it emits a BotStoppedSpeakingFrame followed immediately by a BotStartedSpeakingFrame. This is problematic because it makes it difficult to reliably detect when the bot has actually stopped speaking. For example, this behavior does not occur with Cartesia TTS, where the bot speaking frames would instead be emitted as:

BotSpeakingFrame#998
BotSpeakingFrame#999
BotSpeakingFrame#1000
BotSpeakingFrame#1001
BotSpeakingFrame#1002
BotSpeakingFrame#1003

@gregoryhermann
Copy link

Same is true for Azure TTS. Due to downstream processors, it'd be really helpful to know when the bot's turn is 'over'.
This may be antithetical to some design goals you're executing against, if so, please let us know what we should expect protocol-wise and I can work around it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants