You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the online server with a model with an MLP speculator, sending a request that request prompt logprobs causes the server to crash with an AssertionError.
Stacktrace:
Traceback (most recent call last):
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/rpc/server.py", line 125, in generate
async for request_output in results_generator:
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 1054, in generate
async for output in await self.add_request(
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 114, in generator
raise result
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
return_value = task.result()
^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 920, in run_engine_loop
result = task.result()
^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 863, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 332, in step_async
output = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/executor/gpu_executor.py", line 170, in execute_model_async
output = await make_async(self.driver_worker.execute_model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/spec_decode/spec_decode_worker.py", line 387, in execute_model
return self._run_no_spec(execute_model_req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/spec_decode/spec_decode_worker.py", line 481, in _run_no_spec
self.previous_hidden_states.update(
File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/sequence.py", line 1199, in update
assert len(seq_group_metadata_list) == len(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
To Reproduce
Run a server with an MLP speculator, eg. one of IBM's granite models:
Your current environment
Using the latest vLLM off of
main
.🐛 Describe the bug
When running the online server with a model with an MLP speculator, sending a request that request prompt logprobs causes the server to crash with an
AssertionError
.Stacktrace:
To Reproduce
Run a server with an MLP speculator, eg. one of IBM's granite models:
Send an
echo
request with logprobs requested for the prompt tokens:The text was updated successfully, but these errors were encountered: