[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546

Dong148 · 2024-08-15T06:36:05Z

🚀 The feature, motivation and pitch

I am currently exploring the capabilities of the VLLM library and am interested in understanding its support for multi-modal inputs, particularly for models like MiniCPM-V2.6. I would like to know if VLLM is designed to handle multi-image and video inputs for such models.

Alternatives

Model of Interest: MiniCPM-V2.6
Types of Input: Multi-image and video
Current Understanding:
- I have reviewed the documentation and initial examples provided with VLLM.

It seems that both multiple 'image_url' input and list value in image_url is currently not supported.
However, I am not sure if it supports the processing of multiple images or videos as input to a model like MiniCPM-V2.6.

Questions

Does VLLM support the integration of MiniCPM-V2.6 for processing multi-image and video inputs?
If yes, could you provide an example or a guide on how to set up and use this feature?
If not, are there any plans to extend VLLM's capabilities to support such inputs in the future?

Additional context

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-08-15T06:40:08Z

Multi-image input is currently supported for MiniCPU-V specifically (#7122), with some caveats:

It only works in offline inference, not the OpenAI API-compatible server.
Until the next release, you have to build from source (main branch) to use it.

We are actively working on extending the support for multi-image input - please refer to #4194 for details.

Dong148 · 2024-08-15T06:49:48Z

Multi-image input is currently supported for MiniCPU-V specifically (#7122), with some caveats:

It only works in offline inference, not the OpenAI API-compatible server.

Until the next release, you have to build from source (main branch) to use it.

We are actively working on extending the support for multi-image input - please refer to #4194 for details.

Thank you for your assistance and for taking the time to help me out. I look forward to exploring more features of VLLM and potentially contributing to its development in the future.

Patrick10203 · 2024-08-23T07:49:54Z

It only works in offline inference, not the OpenAI API-compatible server.

Until the next release, you have to build from source (main branch) to use it.

Are you sure that building the main branch supports multi image over the open ai api? Because the line https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/chat_utils.py#L179 is still in the main branch

DarkLight1337 · 2024-08-23T09:28:17Z

It only works in offline inference, not the OpenAI API-compatible server.

Until the next release, you have to build from source (main branch) to use it.

Are you sure that building the main branch supports multi image over the open ai api? Because the line https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/chat_utils.py#L179 is still in the main branch

I was referring to multi-modal support for MiniCPM-V specifically, not for multi-modal models (+OpenAI server) in general.

Dong148 added the feature request label Aug 15, 2024

Dong148 closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546

[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546

Dong148 commented Aug 15, 2024

DarkLight1337 commented Aug 15, 2024 •

edited

Loading

Dong148 commented Aug 15, 2024

Patrick10203 commented Aug 23, 2024

DarkLight1337 commented Aug 23, 2024 •

edited

Loading

[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546

[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546

Comments

Dong148 commented Aug 15, 2024

🚀 The feature, motivation and pitch

Alternatives

Questions

Additional context

DarkLight1337 commented Aug 15, 2024 • edited Loading

Dong148 commented Aug 15, 2024

Patrick10203 commented Aug 23, 2024

DarkLight1337 commented Aug 23, 2024 • edited Loading

DarkLight1337 commented Aug 15, 2024 •

edited

Loading

DarkLight1337 commented Aug 23, 2024 •

edited

Loading