feat: Add voice APIs to communicate with stt and tts through the wyoming protocol #4637
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
What this PR does / why we need it:
This pull request adds 2 new API endpoints:
Both APIs call out to a configured server using the wyoming protocol. It is not dependent on a specific speech to text or text to speech model.
I'm adding this API in order to support the use of a chat bot within Mealie. In particular, I'm aiming to support a "Cook Along" service to let users ask a chatbot questions about the recipe as they're cooking so they don't need to search around for what they're looking for. Speech to text and text to speech are important for implementing such a feature.
Changes in detail
New library
I'm adding the wyoming protocol library for communicating with tts and stt services
Dev container
I've added the docker run arguments:
--add-host=host.docker.internal:host-gateway
to the dev container so that mealie can communicate with tcp services on the host (specifically the wyoming protocol servers)Services
Adds TTSService and STTService for providing text to speech and speech to text. Initialization sets up connection to wyoming protocol servers. TTSService has a synthesize method to synthesize speech. STTService has a transcribe method to transcribe text from speech.
New Routes
Adds 2 new routes
New Settings
Which issue(s) this PR fixes:
Partially implements: discussion 4636, specifically tts and stt
Special notes for your reviewer:
I also have a pull request I'm working on to implement an LLM recipe assistant API which this is meant to support. See the linked discussion for full details.
I haven't worked on any UI for this. I may be working with @miah120 on a chat interface for this in the future.
I also haven't made pull requests to mealie before, so please let me know if there's any improvements I can make to help fit with the existing code.
In order to actually use these APIs, you need a wyoming tts and stt service running and to pass the appropriate environment variables to mealie to connect to them.
Testing
Manual testing