-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi-modal models #746
Comments
Came here looking for this, to see if the discussion had begun surrounding this. Edit: Progress is being made upstream in llama.cpp to support this. |
The PR @ryansereno mentioned is merged and in master now. How can we run this in ollama? |
I could successfully run |
It would be good to have file reader command in the prompt like /read file.jpg for this. |
Could you elaborate on how to map an image within ollama? |
I would like to know as well. Thanks |
it seems a couple of interface design decisions are are play: 1) how to represent this in the http api and 2) what the user/cli interface should be. I want to note/highlight that the folks hacking on iTerm2 have done some work that may be relevant in the cli context here: https://iterm2.com/documentation-images.html For the HTTP interface I'd suggest taking some inspiration of how OpenAI is folding in image data may be useful. I did a bit of protocol decoding and the TL;DR of how they do it is upload to blob store then include a special message type in the completion message list. There's also a/the consideration of if it's an ollama concern to allow annotation of an incoming image to support highlighting part of the image. That feels a bit out of scope to start but perhaps the design should keep that in mind. |
Me too, can explain how to map an image within ollama? |
Love that this is marked as closed but everyone still clueless over here lol |
@marscod thanks for importing the model. Can you type an example of API call, on the model page? |
So I figured how to use it, here's the code snippet:
However it also throws this error: For reference, I prefer using llama.cpp directly with bakllava-1 (way more precise) and the syntax there looks like this:
This is taken from: https://github.com/mangiucugna/local_multimodal_ai Hope this helps! |
@mangiucugna thank you, will give it a try. |
I imported bakllava-1 locally and did some tests and it performs so badly when compared to the llama.cpp implementation that is unusable. Happy to share my Modelfile and link to the gguf for anyone to try to reproduce |
https://github.com/Mozilla-Ocho/llamafile llamafile supports llava-1.5 it would be nice if ollama supported it too |
Since this is now added, I can't figure out how to upload an image to the model. When I follow the instructions at: https://github.com/jmorganca/ollama/releases/tag/v0.1.15, it describes something completely different than what was in the picture. I'm on Linux. |
You probably haven't updated to the latest version of Ollama if you're getting a bunch of Chinese characters as the output. |
I guess that we can consider this issue as completed :) |
When I try this I get:
And I'm using the l atest version of ollama:
|
@prologic llama2 isn't a multimodal model. You should try:
|
Ahh! Thanks. When I tried to search for multimodel models the search turend up empty. This is why I wasn't able to figure this out so easily :/ There should be a way to list for and search for multimodel models, even with |
if you want to use it with langchain. here is what you need to add to the HumanMessage: HumanMessage(
content=[
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": f"data:image/jpeg;base64,{img_base64}",
},
]
) |
This is one of the best open source multi modals based on llama 7 currently. It would nice to be able to host it in ollama.
https://llava-vl.github.io/
The text was updated successfully, but these errors were encountered: