[Frontend] Refactor prompt processing #4028

DarkLight1337 · 2024-04-12T06:15:20Z

This PR refactors various parts of the OpenAI-compatible server:

The _validate_prompt_and_tokenize method has been decomposed so that prompt and prompt_ids are processed separately.
The logging of prompt and prompt_ids has been moved from vllm.AsyncLLMEngine to vllm.entrypoints.logger.RequestLogger such that redundant data is no longer passed into the core engine. This also enables logging for tokenization endpoints.
Explicitly handle LoRA and prompt adapters for embedding endpoints. Previously, they were silently ignored.
Use a different prefix for request_id based on the endpoint type:
- Completion API: cmpl-* (as before)
- Chat Completion API: chat-*
- Embeddings API: embd-*
- Tokenization API: tokn-*
Fixed various type errors

…ions API and legacy Completions API

DarkLight1337 · 2024-04-12T15:10:35Z

@njhill I see that you are working on #3512. I suppose it would be better to get that merged before continuing with this refactor.

DarkLight1337 · 2024-04-13T04:04:29Z

Seems that #4032 fixed the LoRA bugs, however entrypoints-test is still failing.

DarkLight1337 · 2024-04-13T04:40:03Z

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

DarkLight1337 · 2024-04-13T05:30:47Z

@njhill The tests now pass. To move on with #3978, perhaps we can merge this PR first?

DarkLight1337 · 2024-04-13T05:50:17Z

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

I'm updating test_batch_completions to better catch this error. Previously, the error only occurred in test_echo_logprob_completion which misled me from the real cause.

vllm/entrypoints/openai/serving_tokenization.py

njhill · 2024-07-19T01:14:44Z

Looks like this line also needs to be removed from the tokenization test.

DarkLight1337 · 2024-07-19T01:52:57Z

Looks like this line also needs to be removed from the tokenization test.

Actually, let me add add_special_tokens to Completions API (defaulting to True to maintain the existing behaviour) as well to make things consistent. It's a vLLM feature originally introduced by #5278.

DarkLight1337 · 2024-07-19T02:36:22Z

My only concern is whether some folks might make use of the logging in the engine with different front ends.

I've moved out the logging to a separate class vllm.entrypoints.logger.RequestLogger, so it should be easy for them to add code to log the requests if need be.

vllm/entrypoints/openai/protocol.py

DarkLight1337 · 2024-07-20T00:10:19Z

I have finished addressing your comments.

njhill

Thanks @DarkLight1337!

Co-authored-by: Roger Wang <[email protected]>

Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Alvant <[email protected]>

DarkLight1337 added 4 commits April 12, 2024 06:00

Use discriminated union in prompt parsing

ce770f4

Fix some type errors along the way

6b016bc

Some more fixes

7620354

Apply formatter

7c3e6d9

DarkLight1337 mentioned this pull request Apr 12, 2024

[mypy] Add mypy type annotation part 1 #4006

Merged

Refactor prompt parsing so that it can be shared between Chat Complet…

7bdc84e

…ions API and legacy Completions API

DarkLight1337 changed the title ~~[mypy] Improve type annotations in vllm.entrypoints.openai~~ [Frontend] Refactor prompt parsing Apr 12, 2024

DarkLight1337 added 3 commits April 12, 2024 10:02

Make code more readable

a7d1098

Move assertion to a more appropriate place

8b9d636

Add code documentation

c48c13a

njhill self-requested a review April 12, 2024 10:26

DarkLight1337 and others added 2 commits April 12, 2024 10:47

Decompose _validate_prompt_and_tokenize

3530362

Fix missing import due to renaming

b8feec9

DarkLight1337 mentioned this pull request Apr 12, 2024

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978

Closed

7 tasks

Merge branch 'upstream' into openai-typing

89d9086

Fix bug when parsing array of tokens

cc1a5b3

Add token array to batch completions testing

f9c1135

DarkLight1337 added 2 commits April 14, 2024 04:48

Replace legacy conint with Annotated field

f2e8180

Merge branch 'upstream' into openai-typing

797326b

DarkLight1337 mentioned this pull request Apr 19, 2024

[Frontend] Support GPT-4V Chat Completions API #4200

Closed

DarkLight1337 added 5 commits April 23, 2024 10:59

Merge branch 'upstream' into openai-typing

1d00087

Fix mypy error

a1db4e0

Merge branch 'upstream' into openai-typing

7b9a3ff

Combine prompt inputs

5d42800

Fix a bunch of tests

5db2c5e

njhill reviewed Jul 19, 2024

View reviewed changes

vllm/entrypoints/openai/serving_tokenization.py Outdated Show resolved Hide resolved

DarkLight1337 added 5 commits July 19, 2024 02:05

Add add_special_tokens to Completions API to simplify the logic

d0fd1f4

Factor out logging args

2c38ccf

Make optional args keyword-only; cleanup

b0f9595

Remove extra line

4fddfa0

Move definition back

18a1fac

DarkLight1337 and others added 4 commits July 19, 2024 02:44

Avoid creating new list

61cf999

Update args in test_serving_chat.py

76124a9

Silently ignore prompt adapter for tokenization

68d2c96

Clean

7a92e6e

njhill reviewed Jul 19, 2024

View reviewed changes

vllm/entrypoints/openai/protocol.py Show resolved Hide resolved

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

DarkLight1337 added 3 commits July 19, 2024 23:56

Use HTTP boolean format

e78dd28

Fix incorrectly allowing some sampling params to be None

d56c9cc

Fix inconsistent arg availability and ordering

f62edef

DarkLight1337 added 3 commits July 20, 2024 00:12

Fix wrong model class

032eeec

Merge branch 'upstream' into openai-typing

2adb0ad

isort

0a9b0d8

DarkLight1337 mentioned this pull request Jul 22, 2024

[Frontend] Add add_special_tokens parameter to CompletionRequest #6637

Closed

njhill approved these changes Jul 22, 2024

View reviewed changes

njhill merged commit 739b61a into vllm-project:main Jul 22, 2024
72 checks passed

njhill deleted the openai-typing branch July 22, 2024 17:14

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Frontend] Refactor prompt processing (vllm-project#4028)

95931c0

Co-authored-by: Roger Wang <[email protected]>

gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024

[Frontend] Refactor prompt processing (vllm-project#4028)

a796b60

Co-authored-by: Roger Wang <[email protected]>

zifeitong mentioned this pull request Jul 30, 2024

[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user #6954

Merged

cduk pushed a commit to cduk/vllm-pascal that referenced this pull request Aug 6, 2024

[Frontend] Refactor prompt processing (vllm-project#4028)

58f4227

Co-authored-by: Roger Wang <[email protected]>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Frontend] Refactor prompt processing (vllm-project#4028)

1704dc9

Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Refactor prompt processing #4028

[Frontend] Refactor prompt processing #4028

DarkLight1337 commented Apr 12, 2024 •

edited

Loading

DarkLight1337 commented Apr 12, 2024

DarkLight1337 commented Apr 13, 2024 •

edited

Loading

DarkLight1337 commented Apr 13, 2024

DarkLight1337 commented Apr 13, 2024

DarkLight1337 commented Apr 13, 2024

njhill commented Jul 19, 2024

DarkLight1337 commented Jul 19, 2024 •

edited

Loading

DarkLight1337 commented Jul 19, 2024 •

edited

Loading

DarkLight1337 commented Jul 20, 2024

njhill left a comment

[Frontend] Refactor prompt processing #4028

[Frontend] Refactor prompt processing #4028

Conversation

DarkLight1337 commented Apr 12, 2024 • edited Loading

DarkLight1337 commented Apr 12, 2024

DarkLight1337 commented Apr 13, 2024 • edited Loading

DarkLight1337 commented Apr 13, 2024

DarkLight1337 commented Apr 13, 2024

DarkLight1337 commented Apr 13, 2024

njhill commented Jul 19, 2024

DarkLight1337 commented Jul 19, 2024 • edited Loading

DarkLight1337 commented Jul 19, 2024 • edited Loading

DarkLight1337 commented Jul 20, 2024

njhill left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Apr 12, 2024 •

edited

Loading

DarkLight1337 commented Apr 13, 2024 •

edited

Loading

DarkLight1337 commented Jul 19, 2024 •

edited

Loading

DarkLight1337 commented Jul 19, 2024 •

edited

Loading