-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546
Comments
Multi-image input is currently supported for MiniCPU-V specifically (#7122), with some caveats:
We are actively working on extending the support for multi-image input - please refer to #4194 for details. |
Thank you for your assistance and for taking the time to help me out. I look forward to exploring more features of VLLM and potentially contributing to its development in the future. |
Are you sure that building the main branch supports multi image over the open ai api? Because the line https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/chat_utils.py#L179 is still in the main branch |
I was referring to multi-modal support for MiniCPM-V specifically, not for multi-modal models (+OpenAI server) in general. |
🚀 The feature, motivation and pitch
I am currently exploring the capabilities of the VLLM library and am interested in understanding its support for multi-modal inputs, particularly for models like MiniCPM-V2.6. I would like to know if VLLM is designed to handle multi-image and video inputs for such models.
Alternatives
multiple 'image_url' input
andlist value in image_url
is currently not supported.Questions
Additional context
The text was updated successfully, but these errors were encountered: