[Frontend] re-enable multi-modality input in the new beam search implementation #9427

FerdinandZhong · 2024-10-16T16:31:47Z

Changes in this PR:

This PR introduces the following changes based on the updated beam search implementation:

Re-enable multi-modality input:
Support for multi-modality input has been re-enabled for beam search with OpenAI-compatible endpoints.
Logprobs handling in ChatCompletionRequest:
Added additional validation to disable logprobs when use_beam_search=True. Since the beam search selects results based on cumulative logprobs and determines step logprobs by beam_width, it ignores the top_logprobs and logprobs parameters passed in with the request.

Unit Test

Added two additional test cases in tests/entrypoints/openai/test_vision.py.

Manual Testing

The following command was used to launch the server for manual testing: vllm serve microsoft/Phi-3.5-vision-instruct --api-key token-abc123 --trust-remote-code --max-model-len 4096 --limit-mm-per-prompt image=2

Client script used to test the changes:

import openai
import asyncio


url = "http://localhost:"
client = openai.AsyncOpenAI(
    base_url = "http://localhost:8000/v1",
    api_key="token-abc123"
)


# Image URLs
img_urls = [
    "https://upload.wikimedia.org/wikipedia/commons/c/cb/Brachiosaurus_DB_flipped.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/3/3d/Allosaurus_Revised.jpg"
]

# Define the messages for the chat completion
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": img_urls[0]
                }
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": img_urls[1]
                }
            },
            {
                "type": "text",
                "text": "what are the animals in the images?"
            }
        ]
    }
]

async def make_request():
    try:
        response = await client.chat.completions.create(
            model="microsoft/Phi-3.5-vision-instruct",
            max_tokens=32,
            temperature=0,
            messages=messages,
            n=2,
            extra_body={"use_beam_search": True}
        )
        for choice in response.choices:
            print(choice.message.content)

    except openai.BadRequestError as e:
        print(f"Error: {e.code}")

asyncio.run(make_request())

Verified the functionality of multi-image input handling and correct response generation using beam search with the above manual tests.

…odality

github-actions · 2024-10-16T16:31:59Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

FerdinandZhong · 2024-10-16T23:42:27Z

Hi @simon-mo @khluu could you please add me to your Buildkite org to unblock the full CI run?

simon-mo · 2024-10-17T00:04:22Z

Added your email to our buildkite org.

…odality

vllm/engine/protocol.py

vllm/beam_search.py

vllm/engine/protocol.py

…nandZhong/vllm into beam_search_multi_modality

DarkLight1337 · 2024-10-17T06:08:53Z

vllm/engine/protocol.py

+        tokenizer = await self.get_tokenizer()
+        self.input_preprocessor = InputPreprocessor(model_config,
+                                                    self.tokenizer)


Btw why is this function defined inside a protocol? Perhaps we should move this to LLMEngine? Then we can make use of the existing input_preprocessor defined there. @youkaichao

Sorry I made a mistake here, I just pushed another commit to change it from self.input_preprocessor and self.tokenizer to input_preprocessor and tokenizer.

By the way, in the current release v0.6.3, this function is inside AsyncLLMEngine and it has been moved here in a recent pr: 9296

FerdinandZhong · 2024-10-18T03:46:49Z

Hi @DarkLight1337, thank you for the review!

I noticed the changes in PR #9473 and will merge the latest code once that PR is merged. Regarding the conflicts between the two PRs:

Logprobs: I'll align with the fix in #9473. In my PR, I directly prevent the return of logprobs when using beam search, as the number of logprobs in each step is determined by beam_width.
PromptType: I've kept it as the input type for handling multi-modality data. To correctly process the prompt passed to the function, I use inputpreprocess to parse the content, as suggested by @DarkLight1337. Additionaly, I also have prompt_text (parsed from the prompt) set as RequestOutput.prompt, which resolves the error related to PromptType being the input.

@njhill, could you please review these changes?

njhill · 2024-10-18T05:58:49Z

Thanks @FerdinandZhong, I'll review tomorrow (Friday US time)

…_search_multi_modality

… main

FerdinandZhong · 2024-10-20T08:32:23Z

Thanks @FerdinandZhong, I'll review tomorrow (Friday US time)

Thanks @njhill. May I know if changes look okay to you?

…multi_modality

njhill · 2024-10-23T02:01:09Z

@FerdinandZhong sorry for the delay, bit behind with things.

The changes look good to me thanks. The InputProcessor change makes sense.

Re the logprobs, it's a good point that the number returned will be based on the beam width rather than how many are actually requested. I think we can improve this to request the max of these two and truncate as needed. But no need to change that for this PR.

I think we can improve the impl quite a bit more overall in some follow-on updates including:

Support most/all params.. I don't see any reason we can't, this should actually be easier with the "external" impl
Remove the separate beam search API, we can retain the function of the existing beam_search parameters, just have this layer intercept those. I'm not sure that we actually need separate BeamSearchParams.
Move the impl out of protocol.py .. at a minimum we can have it in beam_search.py and just call it from the abstract base class

FerdinandZhong · 2024-10-23T03:27:59Z

@FerdinandZhong sorry for the delay, bit behind with things.

The changes look good to me thanks. The InputProcessor change makes sense.

Re the logprobs, it's a good point that the number returned will be based on the beam width rather than how many are actually requested. I think we can improve this to request the max of these two and truncate as needed. But no need to change that for this PR.

I think we can improve the impl quite a bit more overall in some follow-on updates including:

Support most/all params.. I don't see any reason we can't, this should actually be easier with the "external" impl

Remove the separate beam search API, we can retain the function of the existing beam_search parameters, just have this layer intercept those. I'm not sure that we actually need separate BeamSearchParams.

Move the impl out of protocol.py .. at a minimum we can have it in beam_search.py and just call it from the abstract base class

Hi @njhill, thank you for your comment!

I agree that taking the maximum of the beam_width and top_logprobs can be a good idea, and we can implement that change in the following PR. I'm also aligned with the action points you mentioned for improving beam search, and I'd be happy to collaborate on these enhancements moving forward.

youkaichao · 2024-10-24T03:20:55Z

I'm surprised there are so many efforts for adding various features for beam-search ...

We will work for implementing beam search in another way so that all features for normal generation just works.

FerdinandZhong · 2024-10-24T03:47:07Z

Hi @youkaichao, thank you for the feedback. I understand your concerns, and I agree that, in the long run, beam search can be properly designed and implemented. In the short term, I’m happy to continue providing feedback and contributing commits from a user’s perspective.

In the meantime, @DarkLight1337, could we consider merging this PR first, as it addresses the fix for #9577?

DarkLight1337 · 2024-10-24T04:15:34Z

@njhill do we need to rebuild this? ~~Seem that you cancelled some of the tests.~~ Nevermind, looks like some containers died, I'll just rerun those tests.

…_search_multi_modality

…_search_multi_modality Signed-off-by: Qishuai [email protected]

FerdinandZhong · 2024-10-29T10:02:05Z

Hi @DarkLight1337 , I’ve merged the latest main branch and rerun the tests from my end. May I ask for your advice on the next steps to do with this PR for merging? Additionally, can I check with you if rebase is needed to add "sign-off" for each commit? Thank you.

DarkLight1337 · 2024-10-29T10:05:20Z

Sorry for missing this! You don't have to worry about signing-off for this PR as we can manually set that to pass.

DarkLight1337 · 2024-10-29T10:05:37Z

I have enabled auto-merge which should run all the tests and merge if they pass.

FerdinandZhong · 2024-10-29T10:07:43Z

@DarkLight1337 got it, thanks!

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Randall Smith <[email protected]>

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: NickLucche <[email protected]>

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected]

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Linkun Chen <[email protected]>

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Loc Huynh <[email protected]>

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Sumit Dubey <[email protected]>

FerdinandZhong added 6 commits October 15, 2024 23:28

update of beam search function

08ab78e

update of testing

cac55e1

Merge remote-tracking branch 'upstream/main' into beam_search_multi_m…

358f89c

…odality

fix error in implementation

2dde695

add checking for logprobs and add more test cases

eb92b7d

formatting

014d753

FerdinandZhong requested review from DarkLight1337, robertgshaw2-neuralmagic and simon-mo as code owners October 16, 2024 16:31

Merge remote-tracking branch 'upstream/main' into beam_search_multi_m…

eae5b9b

…odality

DarkLight1337 reviewed Oct 17, 2024

View reviewed changes

vllm/engine/protocol.py Outdated Show resolved Hide resolved

vllm/beam_search.py Show resolved Hide resolved

DarkLight1337 reviewed Oct 17, 2024

View reviewed changes

vllm/engine/protocol.py Outdated Show resolved Hide resolved

FerdinandZhong added 2 commits October 17, 2024 05:51

update BeamSequence, prompt preprocess and adding stop_reason

5f0e1cd

Merge branch 'beam_search_multi_modality' of https://github.com/Ferdi…

6e29318

…nandZhong/vllm into beam_search_multi_modality

DarkLight1337 reviewed Oct 17, 2024

View reviewed changes

FerdinandZhong added 2 commits October 17, 2024 06:34

fix the wrong declaration

5a256cb

formatting

b01a615

DarkLight1337 mentioned this pull request Oct 18, 2024

[BugFix] Typing fixes to RequestOutput.prompt and beam search #9473

Merged

DarkLight1337 requested a review from njhill October 18, 2024 02:51

FerdinandZhong added 6 commits October 18, 2024 09:05

Merge branch 'main' of https://github.com/vllm-project/vllm into beam…

bc74931

…_search_multi_modality

remove checking for logprobs

8291a80

format

a682b63

Merge branch 'main' of https://github.com/vllm-project/vllm into beam…

8940743

…_search_multi_modality

output beam's logprobs to Output's logprobs

bb53cbd

Merge branch 'main' of https://github.com/vllm-project/vllm into beam…

c275ae3

…_search_multi_modality

update calling of beam_search from serving_completion based on latest…

3b7ab92

… main

Merge branch 'main' of github.com:vllm-project/vllm into beam_search_…

f96fa9a

…multi_modality

njhill approved these changes Oct 23, 2024

View reviewed changes

Merge branch 'main' of https://github.com/vllm-project/vllm into beam…

314a31e

…_search_multi_modality

mergify bot added the frontend label Oct 29, 2024

Merge branch 'main' of https://github.com/vllm-project/vllm into beam…

8705266

…_search_multi_modality Signed-off-by: Qishuai [email protected]

DarkLight1337 enabled auto-merge (squash) October 29, 2024 10:04

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 29, 2024

DarkLight1337 merged commit ef7865b into vllm-project:main Oct 29, 2024
75 checks passed

rasmith pushed a commit to rasmith/vllm that referenced this pull request Oct 30, 2024

[Frontend] re-enable multi-modality input in the new beam search impl…

db6a92e

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Randall Smith <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[Frontend] re-enable multi-modality input in the new beam search impl…

86015a9

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected]

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[Frontend] re-enable multi-modality input in the new beam search impl…

2e83f46

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Linkun Chen <[email protected]>

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[Frontend] re-enable multi-modality input in the new beam search impl…

2b40322

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Loc Huynh <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Frontend] re-enable multi-modality input in the new beam search impl…

a9748c4

…ementation (vllm-project#9427) Signed-off-by: Qishuai [email protected] Signed-off-by: Sumit Dubey <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] re-enable multi-modality input in the new beam search implementation #9427

[Frontend] re-enable multi-modality input in the new beam search implementation #9427

FerdinandZhong commented Oct 16, 2024 •

edited by DarkLight1337

Loading

github-actions bot commented Oct 16, 2024

FerdinandZhong commented Oct 16, 2024

simon-mo commented Oct 17, 2024

DarkLight1337 Oct 17, 2024

FerdinandZhong Oct 17, 2024

FerdinandZhong commented Oct 18, 2024

njhill commented Oct 18, 2024

FerdinandZhong commented Oct 20, 2024 •

edited

Loading

njhill commented Oct 23, 2024 •

edited

Loading

FerdinandZhong commented Oct 23, 2024

youkaichao commented Oct 24, 2024

FerdinandZhong commented Oct 24, 2024

DarkLight1337 commented Oct 24, 2024 •

edited

Loading

FerdinandZhong commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024

FerdinandZhong commented Oct 29, 2024

[Frontend] re-enable multi-modality input in the new beam search implementation #9427

[Frontend] re-enable multi-modality input in the new beam search implementation #9427

Conversation

FerdinandZhong commented Oct 16, 2024 • edited by DarkLight1337 Loading

github-actions bot commented Oct 16, 2024

FerdinandZhong commented Oct 16, 2024

simon-mo commented Oct 17, 2024

DarkLight1337 Oct 17, 2024

Choose a reason for hiding this comment

FerdinandZhong Oct 17, 2024

Choose a reason for hiding this comment

FerdinandZhong commented Oct 18, 2024

njhill commented Oct 18, 2024

FerdinandZhong commented Oct 20, 2024 • edited Loading

njhill commented Oct 23, 2024 • edited Loading

FerdinandZhong commented Oct 23, 2024

youkaichao commented Oct 24, 2024

FerdinandZhong commented Oct 24, 2024

DarkLight1337 commented Oct 24, 2024 • edited Loading

FerdinandZhong commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024

FerdinandZhong commented Oct 29, 2024

FerdinandZhong commented Oct 16, 2024 •

edited by DarkLight1337

Loading

FerdinandZhong commented Oct 20, 2024 •

edited

Loading

njhill commented Oct 23, 2024 •

edited

Loading

DarkLight1337 commented Oct 24, 2024 •

edited

Loading