Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Refactor prompt processing #4028

Merged
merged 139 commits into from
Jul 22, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Apr 12, 2024

This PR refactors various parts of the OpenAI-compatible server:

  • The _validate_prompt_and_tokenize method has been decomposed so that prompt and prompt_ids are processed separately.
  • The logging of prompt and prompt_ids has been moved from vllm.AsyncLLMEngine to vllm.entrypoints.logger.RequestLogger such that redundant data is no longer passed into the core engine. This also enables logging for tokenization endpoints.
  • Explicitly handle LoRA and prompt adapters for embedding endpoints. Previously, they were silently ignored.
  • Use a different prefix for request_id based on the endpoint type:
    • Completion API: cmpl-* (as before)
    • Chat Completion API: chat-*
    • Embeddings API: embd-*
    • Tokenization API: tokn-*
  • Fixed various type errors

@DarkLight1337 DarkLight1337 changed the title [mypy] Improve type annotations in vllm.entrypoints.openai [Frontend] Refactor prompt parsing Apr 12, 2024
@njhill njhill self-requested a review April 12, 2024 10:26
@DarkLight1337
Copy link
Member Author

@njhill I see that you are working on #3512. I suppose it would be better to get that merged before continuing with this refactor.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Apr 13, 2024

Seems that #4032 fixed the LoRA bugs, however entrypoints-test is still failing.

@DarkLight1337
Copy link
Member Author

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

@DarkLight1337
Copy link
Member Author

@njhill The tests now pass. To move on with #3978, perhaps we can merge this PR first?

@DarkLight1337
Copy link
Member Author

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

I'm updating test_batch_completions to better catch this error. Previously, the error only occurred in test_echo_logprob_completion which misled me from the real cause.

@njhill
Copy link
Member

njhill commented Jul 19, 2024

Looks like this line also needs to be removed from the tokenization test.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Jul 19, 2024

Looks like this line also needs to be removed from the tokenization test.

Actually, let me add add_special_tokens to Completions API (defaulting to True to maintain the existing behaviour) as well to make things consistent. It's a vLLM feature originally introduced by #5278.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Jul 19, 2024

My only concern is whether some folks might make use of the logging in the engine with different front ends.

I've moved out the logging to a separate class vllm.entrypoints.logger.RequestLogger, so it should be easy for them to add code to log the requests if need be.

vllm/entrypoints/openai/protocol.py Show resolved Hide resolved
vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved
@DarkLight1337
Copy link
Member Author

I have finished addressing your comments.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DarkLight1337!

@njhill njhill merged commit 739b61a into vllm-project:main Jul 22, 2024
72 checks passed
@njhill njhill deleted the openai-typing branch July 22, 2024 17:14
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024
cduk pushed a commit to cduk/vllm-pascal that referenced this pull request Aug 6, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants