-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support gpt-4o-audio-preview
for input (not output)
#608
Comments
You can manually add models using this mechanism: https://llm.datasette.io/en/stable/openai-models.html#adding-more-openai-models But you'd need to write custom Python code to get it working with audio attachments. |
gpt-4o-audio-preview
not supportedgpt-4o-audio-preview
for input (not output)
Adding audio input shouldn't be too hard thanks to the new attachments mechanism. Adding audio output support will require more work in core, since I need a way to decode and store the returned audio files. Related research: https://simonwillison.net/2024/Oct/18/openai-audio/ |
OK, this seems to work for both Example files (same thing in both formats):
llm -m gpt-4o-audio-preview -a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3 '.'
Note that I need to provide a prompt of |
This works now: llm -m gpt-4o-audio-preview \
-a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3 |
Interestingly the system prompt does not seem to be very effective with audio attachments. My
Note how the "transcribe this audio" system prompts are ignored - but the "add the word walrus at the end" system prompt is obeyed, showing that system prompts are getting through but they are just being over-ridden by whatever is in the audio. |
For the moment you can try this out by upgrading to the most recent commit release of LLM like this: llm install https://github.com/simonw/llm/archive/0cc4072bcd9af4e4c9f030955179e7614dcd9d00.zip Or if that doesn't work (the Homebrew version doesn't like attempts to upgrade itself) you could run it using uvx --with 'https://github.com/simonw/llm/archive/0cc4072bcd9af4e4c9f030955179e7614dcd9d00.zip' \
llm -m gpt-4o-audio-preview \
-a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3 |
Thanks, I installed it using Perhaps using JSON mode can improve that. But 4o also often returned this BS:
Gemini models work great, on the other hand, even OpenAI's Advanced Voice Mode is also significantly worse than Gemini Live on my Iranian accent and unstable internet connection. I guess OpenAI's only advantage right now is in their o1 model, all of their other offerings are no longer SOTA. |
I'm trying to use Gemini models with audio through OpenRouter, and I'm wondering about the configuration. Since OpenRouter works like OpenAI's API, I guess I need to add the model to the extra-openai-models.yaml file in my Application Support directory - but how do I specify which attachments are supported? |
How do I manually add this to the model list so that
llm
knows it supports audio files? (asking for future reference)The text was updated successfully, but these errors were encountered: