vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. #3295
Labels
duplicate
This issue or pull request already exists
I got the following error when running a long prompt/output on a fine tuned mistral, that otherwise works great.
params
{
"max_tokens": 9000,
"temperature": 0.0,
"n": 1,
"best_of": 5,
"use_beam_search": true
}
INFO 03-09 07:34:14 metrics.py:213] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 37.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 15.9%, CPU KV cache usage: 0.0%
INFO 03-09 07:34:17 async_llm_engine.py:133] Aborted request cmpl-00d201404782417f91da55952303060e-0.
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f9bdd35c160>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f9bd324d660>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f9bdd35c160>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f9bd324d660>)>
Traceback (most recent call last):
File "/workspace/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
task.result()
File "/workspace/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/workspace/vllm/engine/async_llm_engine.py", line 393, in engine_step
request_outputs = await self.engine.step_async()
File "/workspace/vllm/engine/async_llm_engine.py", line 203, in step_async
return self._process_model_outputs(output, scheduler_outputs)
File "/workspace/vllm/engine/llm_engine.py", line 756, in _process_model_outputs
self._process_sequence_group_outputs(seq_group, outputs)
File "/workspace/vllm/engine/llm_engine.py", line 608, in _process_sequence_group_outputs
self.scheduler.free_seq(parent)
File "/workspace/vllm/core/scheduler.py", line 399, in free_seq
self.block_manager.free(seq)
File "/workspace/vllm/core/block_manager.py", line 314, in free
self._free_block_table(block_table)
File "/workspace/vllm/core/block_manager.py", line 305, in _free_block_table
self.gpu_allocator.free(block)
File "/workspace/vllm/core/block_manager.py", line 45, in free
raise ValueError(f"Double free! {block} is already freed.")
ValueError: Double free! PhysicalTokenBlock(device=Device.GPU, block_number=1875, ref_count=0) is already freed.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/workspace/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
raise exc
File "/workspace/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
The text was updated successfully, but these errors were encountered: