You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can repro by starting up the server example and sending requests with different temperatures. Failure should look like
async for i, res in result_generator:
File "/home/cglagovich/vllm/vllm/utils.py", line 506, in merge_async_iterators
item = await d
File "/home/cglagovich/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
raise request_output
File "/home/cglagovich/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
raise request_output
AssertionError: Currently only supporting same temperature for all sequences in batch```
The text was updated successfully, but these errors were encountered:
skhorasganiTT
changed the title
[Bug] vLLM server fails when requests with different temperatures are sent
[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent)
Dec 6, 2024
Addressed invalid request parameters in #41, so that invalid request errors are returned instead of the server crashing. For the case of different top-pk parameters in the same batch, the same solution cannot be applied since any error or assertion during step execution will crash the server, so it was changed to a warning instead. Created a separate issue for adding support for different top-pk parameters in the same batch: #42
Anything you want to discuss about vllm.
You can repro by starting up the server example and sending requests with different temperatures. Failure should look like
The text was updated successfully, but these errors were encountered: