Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Requesting Prompt Logprobs with an MLP Speculator Crashes the Server #7742

Closed
tjohnson31415 opened this issue Aug 21, 2024 · 1 comment · Fixed by #8047
Closed

[Bug]: Requesting Prompt Logprobs with an MLP Speculator Crashes the Server #7742

tjohnson31415 opened this issue Aug 21, 2024 · 1 comment · Fixed by #8047
Labels
bug Something isn't working

Comments

@tjohnson31415
Copy link
Contributor

Your current environment

Using the latest vLLM off of main.

🐛 Describe the bug

When running the online server with a model with an MLP speculator, sending a request that request prompt logprobs causes the server to crash with an AssertionError.

Stacktrace:

Traceback (most recent call last):
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/rpc/server.py", line 125, in generate
    async for request_output in results_generator:
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 1054, in generate
    async for output in await self.add_request(
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 114, in generator
    raise result
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 920, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 863, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 332, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/executor/gpu_executor.py", line 170, in execute_model_async
    output = await make_async(self.driver_worker.execute_model
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/spec_decode/spec_decode_worker.py", line 387, in execute_model
    return self._run_no_spec(execute_model_req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/spec_decode/spec_decode_worker.py", line 481, in _run_no_spec
    self.previous_hidden_states.update(
  File "/workspace/my-vllm/lib64/python3.11/site-packages/vllm/sequence.py", line 1199, in update
    assert len(seq_group_metadata_list) == len(hidden_states)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

To Reproduce

Run a server with an MLP speculator, eg. one of IBM's granite models:

vllm serve ibm-granite/granite-3b-code-instruct --speculative-model ibm-granite/granite-3b-code-instruct-accelerator --use-v2-block-manager --enforce-eager

Send an echo request with logprobs requested for the prompt tokens:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "ibm-granite/granite-3b-code-instruct",
      "prompt": "Hello World",
      "echo": 1,
      "logprobs": 1,
      "temperature": 0
  }'
@tjohnson31415 tjohnson31415 added the bug Something isn't working label Aug 21, 2024
@tjohnson31415
Copy link
Contributor Author

I wanted to create an issue to describe the crash and the reproduction, but I am also investigating a fix for this and will push up a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant