[Model][VLM] Add multi-video support for LLaVA-Onevision #8905

litianjian · 2024-09-27T13:15:50Z

Add multi-video support for LLaVA-Onevision models.

Example

import av
import time
import numpy as np
from huggingface_hub import hf_hub_download
import vllm
from vllm import LLM, SamplingParams

MODEL="llava-hf/llava-onevision-qwen2-7b-ov-hf"

text_prompt = "<|im_start|>user <video> <video>\nWhat’s the difference between these two videos?.<|im_end|><|im_start|>assistant\n"

def read_video_pyav(container, indices):
    '''
    Decode the video with PyAV decoder.
    Args:
        container (`av.container.input.InputContainer`): PyAV container.
        indices (`List[int]`): List of frame indices to decode.
    Returns:
        result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
    '''
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])


video_path = hf_hub_download(repo_id="raushan-testing-hf/videos-test", filename="sample_demo_1.mp4", repo_type="dataset")
container = av.open(video_path)
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 32).astype(int)
video = read_video_pyav(container, indices)

llm = LLM(model=MODEL, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.8,
                            top_p=0.95,
                            max_tokens=100)
outputs = llm.generate(
    {
        "prompt": text_prompt,
        "multi_modal_data": {
            "video": [video,video]
        }
    },
    sampling_params=sampling_params)

generated_text = ""
for o in outputs:
    generated_text += o.outputs[0].text
print(f"LLM output:{generated_text}")

github-actions · 2024-09-27T13:16:03Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-09-27T13:28:04Z

FYI I found some problems with the chat template earlier (see #8874). Please double check that you're using the correct chat templates for examples/tests in this PR.

vllm/model_executor/models/llava_onevision.py

vllm/multimodal/video.py

vllm/model_executor/models/llava_onevision.py

DarkLight1337 · 2024-10-07T08:13:36Z

Sorry for the long wait! Can you add a test with multi-video input so it is easy to verify that this works?

litianjian · 2024-10-09T07:31:49Z

Sorry for the long wait! Can you add a test with multi-video input so it is easy to verify that this works?

Sorry for late response. I added a test with multi-video input.

litianjian · 2024-10-28T10:59:10Z

I have updated the tests. Could you please review it when you're available? @DarkLight1337

vllm/model_executor/models/llava_onevision.py

DarkLight1337 · 2024-10-28T11:10:01Z

Meanwhile let's see if the tests pass CI.

Co-authored-by: Cyrus Leung <[email protected]>

DarkLight1337

I ran the tests locally and they pass. Let's run the CI again.

Thanks for your patience!

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: qishuai <[email protected]>

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Randall Smith <[email protected]>

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: NickLucche <[email protected]>

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

DarkLight1337 requested a review from ywang96 September 27, 2024 13:28

DarkLight1337 reviewed Sep 27, 2024

View reviewed changes

vllm/model_executor/models/llava_onevision.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 27, 2024

View reviewed changes

vllm/multimodal/video.py Show resolved Hide resolved

DarkLight1337 reviewed Sep 29, 2024

View reviewed changes

vllm/model_executor/models/llava_onevision.py Outdated Show resolved Hide resolved

DarkLight1337 changed the title ~~[Bugfix][VLM]Add multi-video support for LLaVA-Onevision model~~ [Model][VLM] Add multi-video support for LLaVA-Onevision model Sep 30, 2024

DarkLight1337 changed the title ~~[Model][VLM] Add multi-video support for LLaVA-Onevision model~~ [Model][VLM] Add multi-video support for LLaVA-Onevision Sep 30, 2024

litianjian mentioned this pull request Oct 9, 2024

[Model][VLM] Add LLaVA-Onevision model support #8486

Merged

3 tasks

ywang96 mentioned this pull request Oct 15, 2024

[RFC]: Support for video input #7558

Closed

litianjian added 6 commits October 28, 2024 08:09

add multi video support for llava-ov

149dbd2

llava-onevision multi batch support

6bb1881

update

2d6e8e5

update

88bb43f

update llava-onevision test

ba8ac97

update

737e0ea

litianjian force-pushed the multi-video-support branch from 229d531 to 737e0ea Compare October 28, 2024 10:17

format

c3be867

DarkLight1337 reviewed Oct 28, 2024

View reviewed changes

vllm/model_executor/models/llava_onevision.py Outdated Show resolved Hide resolved

vllm/model_executor/models/llava_onevision.py Outdated Show resolved Hide resolved

litianjian and others added 4 commits October 28, 2024 19:11

Update vllm/model_executor/models/llava_onevision.py

fd2e540

Co-authored-by: Cyrus Leung <[email protected]>

remove unnecessary code

68d8126

remove unnecessary code

f831f9e

Remove large gpu test decorator

b28ebc8

DarkLight1337 approved these changes Oct 28, 2024

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2024

DarkLight1337 enabled auto-merge (squash) October 28, 2024 13:51

DarkLight1337 merged commit 5f8d807 into vllm-project:main Oct 28, 2024
59 checks passed

This was referenced Nov 1, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

[Doc] Update multi-input support #9906

Merged

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[Model][VLM] Add multi-video support for LLaVA-Onevision (vllm-projec…

0c526ad

…t#8905) Co-authored-by: litianjian <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model][VLM] Add multi-video support for LLaVA-Onevision #8905

[Model][VLM] Add multi-video support for LLaVA-Onevision #8905

litianjian commented Sep 27, 2024

github-actions bot commented Sep 27, 2024

DarkLight1337 commented Sep 27, 2024

DarkLight1337 commented Oct 7, 2024

litianjian commented Oct 9, 2024

litianjian commented Oct 28, 2024

DarkLight1337 commented Oct 28, 2024

DarkLight1337 left a comment

[Model][VLM] Add multi-video support for LLaVA-Onevision #8905

[Model][VLM] Add multi-video support for LLaVA-Onevision #8905

Conversation

litianjian commented Sep 27, 2024

Example

github-actions bot commented Sep 27, 2024

DarkLight1337 commented Sep 27, 2024

DarkLight1337 commented Oct 7, 2024

litianjian commented Oct 9, 2024

litianjian commented Oct 28, 2024

DarkLight1337 commented Oct 28, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment