-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Refactor prompt processing #4028
Conversation
…ions API and legacy Completions API
vllm.entrypoints.openai
Seems that #4032 fixed the LoRA bugs, however |
Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now. |
I'm updating |
Looks like this line also needs to be removed from the tokenization test. |
I've moved out the logging to a separate class |
I have finished addressing your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DarkLight1337!
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Alvant <[email protected]>
This PR refactors various parts of the OpenAI-compatible server:
_validate_prompt_and_tokenize
method has been decomposed so thatprompt
andprompt_ids
are processed separately.prompt
andprompt_ids
has been moved fromvllm.AsyncLLMEngine
tovllm.entrypoints.logger.RequestLogger
such that redundant data is no longer passed into the core engine. This also enables logging for tokenization endpoints.request_id
based on the endpoint type:cmpl-*
(as before)chat-*
embd-*
tokn-*