You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As you can see, while speaking, it emits a BotStoppedSpeakingFrame followed immediately by a BotStartedSpeakingFrame. This is problematic because it makes it difficult to reliably detect when the bot has actually stopped speaking. For example, this behavior does not occur with Cartesia TTS, where the bot speaking frames would instead be emitted as:
Same is true for Azure TTS. Due to downstream processors, it'd be really helpful to know when the bot's turn is 'over'.
This may be antithetical to some design goals you're executing against, if so, please let us know what we should expect protocol-wise and I can work around it.
The DeepgramTTSService often exhibits the following pattern when processing a single LLM response:
As you can see, while speaking, it emits a BotStoppedSpeakingFrame followed immediately by a BotStartedSpeakingFrame. This is problematic because it makes it difficult to reliably detect when the bot has actually stopped speaking. For example, this behavior does not occur with Cartesia TTS, where the bot speaking frames would instead be emitted as:
The text was updated successfully, but these errors were encountered: