-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audio to text pipeline #103
Add audio to text pipeline #103
Conversation
This commit contains a quick proof of concept to showcase how easy it is to add a new pipeline.
I did a review of this PR. Initial comments below. My tests included: speech-mp3-lowbitrate.mp3 (worked), speech-aac-lowbitrate.m4a (worked), vp8-opus.webm (worked), vp8-vorbis.webm (worked), vp9-vorbis.webm (worked), h264 and h265 variants did not work for some reason in mp4 files. Pulling the audio out of the mp4 files were successfully processed.
|
I've updated the runner to handle these errors. The ai-runner responds with a 400 Bad Request and logs The gateway logs The orchestrator logs |
Rename pipeline speech-to-text to audio-to-text
I've added error handling to the AI Runner if the model experiences any error while processing the file. It specifically checks for
That's correct, I don't see a purpose for the seed parameter with this model. I think it's unlikely this pipeline will need it.
I've added this to |
This commit introduces support for the Stable Diffusion 3 Medium model from Hugging Face: [https://huggingface.co/stabilityai/stable-diffusion-3-medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium). Please be aware that this model has restrictive licensing at the time of writing and is not yet advised for public use. Ensure you read and understand the [licensing terms](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE) before enabling this model on your orchestrator.
0d03040
to
2600f57
Compare
cd1feb4
to
0d03040
Compare
This commit applies several code improvements to the audio-to-text codebase. It also restructures the utility functions in the pipelines module.
This commit ensures that both audio-to-text routes have known responses.
Speech to text pipeline poc n review
Add audio to text pipeline --------- Co-authored-by: Rick Staa
Add audio to text pipeline --------- Co-authored-by: Rick Staa
This change adds the audio-to-text pipeline to the AI Runner