Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava-cli: improve llava-cli and the API for using LLaVA #6027

Open
phymbert opened this issue Mar 12, 2024 · 4 comments
Open

llava-cli: improve llava-cli and the API for using LLaVA #6027

phymbert opened this issue Mar 12, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed llava LLaVa and multimodal

Comments

@phymbert
Copy link
Collaborator

From:

  1. cleaning up the clip/llava libs and improving the API
  2. in the old implementation, there were many internal object exposed to the server and the memory management was dubious
  3. there was no obvious path for supporting parallel multimodal slots
@phymbert phymbert added the enhancement New feature or request label Mar 12, 2024
@phymbert
Copy link
Collaborator Author

@ggerganov please tell me how I can help on this

@phymbert
Copy link
Collaborator Author

ping @damian0815 as you originally started llava.h

@phymbert phymbert mentioned this issue Mar 22, 2024
4 tasks
@phymbert phymbert added llava LLaVa and multimodal help wanted Extra attention is needed good first issue Good for newcomers labels Mar 22, 2024
@JoanFM
Copy link
Contributor

JoanFM commented Jun 11, 2024

Hello,

Is there any progress in here? I wonder if I could be of any help.

I think it would be nice to make multimodality much more of a first class citizen in llama.cpp. I would be interested on supporting jina-clip-v1 model after the refactoring.

@ngxson
Copy link
Collaborator

ngxson commented Jun 19, 2024

I'm recently playing around with the currently llava implementation.

Currently, a clip model has its own clip_model_load which does not use mmap. While clip_image_batch_encode exists that could be used to process parallel slots, it's not used by llava.cpp. One of the idea that I have in my mind is to somehow reuse llama_load_model_from_file to load the model and llama_decode to decode batch of patches/images.

But that's only very draft idea, probably too complicated to implement atm. @ggerganov what do you think about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed llava LLaVa and multimodal
Projects
None yet
Development

No branches or pull requests

4 participants