Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

Closed
cglagovichTT opened this issue Oct 29, 2024 · 4 comments
Assignees
Labels
bug Something isn't working P0

Comments

@cglagovichTT
Copy link

cglagovichTT commented Oct 29, 2024

Anything you want to discuss about vllm.

You can repro by starting up the server example and sending requests with different temperatures. Failure should look like

    async for i, res in result_generator:
  File "/home/cglagovich/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/cglagovich/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  File "/home/cglagovich/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
AssertionError: Currently only supporting same temperature for all sequences in batch```
@tstescoTT tstescoTT added the bug Something isn't working label Nov 13, 2024
@skhorasganiTT skhorasganiTT changed the title [Bug] vLLM server fails when requests with different temperatures are sent [Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) Dec 6, 2024
@skhorasganiTT
Copy link

Also reported in #39

@cglagovichTT
Copy link
Author

Is #39 really a duplicate of this?

@skhorasganiTT
Copy link

#39 is reporting that the server crashes upon asserts (not about the actual assert itself) which is the same issue as here.

@skhorasganiTT
Copy link

Addressed invalid request parameters in #41, so that invalid request errors are returned instead of the server crashing. For the case of different top-pk parameters in the same batch, the same solution cannot be applied since any error or assertion during step execution will crash the server, so it was changed to a warning instead. Created a separate issue for adding support for different top-pk parameters in the same batch: #42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0
Projects
None yet
Development

No branches or pull requests

3 participants