[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

cglagovichTT · 2024-10-29T15:47:06Z

Anything you want to discuss about vllm.

You can repro by starting up the server example and sending requests with different temperatures. Failure should look like

    async for i, res in result_generator:
  File "/home/cglagovich/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/cglagovich/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  File "/home/cglagovich/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
AssertionError: Currently only supporting same temperature for all sequences in batch```

The text was updated successfully, but these errors were encountered:

skhorasganiTT · 2024-12-06T22:54:22Z

Also reported in #39

cglagovichTT · 2024-12-09T17:59:31Z

Is #39 really a duplicate of this?

skhorasganiTT · 2024-12-09T18:02:11Z

#39 is reporting that the server crashes upon asserts (not about the actual assert itself) which is the same issue as here.

skhorasganiTT · 2024-12-10T22:30:06Z

Addressed invalid request parameters in #41, so that invalid request errors are returned instead of the server crashing. For the case of different top-pk parameters in the same batch, the same solution cannot be applied since any error or assertion during step execution will crash the server, so it was changed to a warning instead. Created a separate issue for adding support for different top-pk parameters in the same batch: #42

cglagovichTT assigned skhorasganiTT Oct 29, 2024

tstescoTT added the bug Something isn't working label Nov 13, 2024

skhorasganiTT changed the title ~~[Bug] vLLM server fails when requests with different temperatures are sent~~ [Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) Dec 6, 2024

skhorasganiTT mentioned this issue Dec 6, 2024

[Bug]: Client prompt exceeding model MAX_PREFILL_SEQ_LEN causes vLLM server crash #39

Closed

1 task

cglagovichTT added the P0 label Dec 9, 2024

cglagovichTT mentioned this issue Dec 10, 2024

[Bug]: 32K Input seq len prefill fails request validation #40

Closed

1 task

skhorasganiTT mentioned this issue Dec 10, 2024

Add device-specific request validation in LLMEngine and modify request-specific asserts in TTModelRunner to not crash server instances #41

Merged

skhorasganiTT closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

cglagovichTT commented Oct 29, 2024 •

edited

Loading

skhorasganiTT commented Dec 6, 2024

cglagovichTT commented Dec 9, 2024

skhorasganiTT commented Dec 9, 2024

skhorasganiTT commented Dec 10, 2024

[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

Comments

cglagovichTT commented Oct 29, 2024 • edited Loading

Anything you want to discuss about vllm.

skhorasganiTT commented Dec 6, 2024

cglagovichTT commented Dec 9, 2024

skhorasganiTT commented Dec 9, 2024

skhorasganiTT commented Dec 10, 2024

cglagovichTT commented Oct 29, 2024 •

edited

Loading