-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] API support for beam search for MQLLMEngine #9117
Changes from 5 commits
5add5d7
1375b59
b57bff2
8384451
f9843df
a33ea39
ac5520c
b3c5d05
b22e8bd
097479d
44334ee
4918f6a
e65ed31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
|
||
from vllm.config import ModelConfig | ||
from vllm.engine.async_llm_engine import AsyncLLMEngine | ||
from vllm.engine.multiprocessing.client import MQLLMEngineClient | ||
from vllm.engine.protocol import EngineClient | ||
from vllm.entrypoints.logger import RequestLogger | ||
# yapf conflicts with isort for this block | ||
|
@@ -150,15 +151,16 @@ async def create_completion( | |
log_tracing_disabled_warning() | ||
|
||
if isinstance(sampling_params, BeamSearchParams): | ||
if not isinstance(self.engine_client, AsyncLLMEngine): | ||
raise ValueError( | ||
"Beam search in the API server is only supported" | ||
" with AsyncLLMEngine. please add " | ||
"`--disable-frontend-multiprocessing` to " | ||
"use beam search.") | ||
assert isinstance(self.engine_client, | ||
(AsyncLLMEngine, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On second thought, you don't actually need the You can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the base There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does not. But the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I cannot find There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay, it should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EngineClient is a protocol. MQLLMEngine should inherit from this. If it doesn’t, I’ll submit a PR to make it (since we support the full API). On train so AFK Either way, we are about to collapse MQLLMEngine and AsyncLLMEngine once we have PP working, so the concept of an EngineClient will be removed once this is done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good. I'll go ahead and merge this pr after it is ready. and after you make MQLLMEngine inherit from EngineClient, we can merge separate beam search implementation in one place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that it's currently a I agree with @robertgshaw2-neuralmagic that this method should just be added to Not directly related to this PR but I also think we should consider renaming it to something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changes are welcome on this part! |
||
MQLLMEngineClient)), \ | ||
"Beam search is only supported with" \ | ||
"AsyncLLMEngine and MQLLMEngineClient." | ||
generator = self.engine_client.beam_search( | ||
prompt_inputs["prompt_token_ids"], request_id_item, | ||
sampling_params) | ||
prompt_inputs["prompt_token_ids"], | ||
request_id, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh yeah it's supposed to be |
||
sampling_params, | ||
) | ||
else: | ||
generator = self.engine_client.generate( | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add some sort of kwarg to show why it is
None
here