[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

zoltan-fedor · 2024-08-30T02:45:06Z

Your current environment

Running the standard v0.5.5 docker image from your Dockerhub repo without anything additional added to it.

🐛 Describe the bug

When using Llama 3.1 70b AWQ model running on 4 A10G 24Gb GPUs with args:

--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--enable-prefix-caching
--num-scheduler-steps 8
--dtype half
--max-model-len 32768

vLLM crashes and requires a full restart. Error:

	
INFO 08-29 19:33:37 server.py:222] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=AssertionError('expected running sequences')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 111, in generate
    async for request_output in results_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 1064, in generate
    async for output in await self.add_request(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 113, in generator
    raise result
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
    result = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 873, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 356, in step_async
    request_outputs = self._process_model_outputs(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1232, in _process_model_outputs
    self.output_processor.process_outputs(seq_group, outputs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/multi_step.py", line 73, in process_outputs
    assert seqs, "expected running sequences"
AssertionError: expected running sequences

The issue is random, the same query does NOT reproduce it.

We have upgraded 6 hours ago and since then this happened 3 times.

We now need to downgrade and consider v0.5.5 a buggy release.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

WoosukKwon · 2024-08-30T03:05:12Z

@zoltan-fedor Thanks for reporting the bug. Could you please use without --num-scheduler-steps 8? I think there were several bug fixes on it after v0.5.5.

zoltan-fedor · 2024-08-31T02:21:47Z

thanks @WoosukKwon, unfortunately even without the --num-scheduler-steps 8 flag it has still failed (although with a different error):

│ ERROR 08-30 19:19:10 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:19:10 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:19:10 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:19:10 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:19:10 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:19:10 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:19:10 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:19:10 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:19:10 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:19:10 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:19:10 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:19:10 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:19:10 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:19:10 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:19:10 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:19:10 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:19:10 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.90.10:51000 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                               │
│ CRITICAL 08-30 19:19:10 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.88.168:38032 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ [2024-08-30 19:19:10,138 E 64 3526] logging.cc:115: Stack trace:                                                                                                                                                                                                                                                                                                                                      │
│  /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10b96aa) [0x7fc380d896aa] ray::operator<<()                                                                                                                                                                                                                                                                                                │
│ /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10bc932) [0x7fc380d8c932] ray::TerminateHandler()                                                                                                                                                                                                                                                                                           │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c) [0x7fc4ce2cc37c]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7) [0x7fc4ce2cc3e7]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa36f) [0x7fc4ce2cc36f]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xe5ab35) [0x7fc4807e0b35] c10d::ProcessGroupNCCL::ncclCommWatchdog()                                                                                                                                                                                                                                                             │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7fc4ce2f8df4]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fc4cf4c9609] start_thread                                                                                                                                                                                                                                                                                                                      │
│ /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fc4cf603353] __clone                                                                                                                                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ *** SIGABRT received at time=1725070750 on cpu 13 ***                                                                                                                                                                                                                                                                                                                                                 │
│ PC: @     0x7fc4cf52700b  (unknown)  raise                                                                                                                                                                                                                                                                                                                                                            │
│     @     0x7fc4cf527090       3216  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7fc4ce2cc37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7fc4ce2cc090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:19:10,139 E 64 3526] logging.cc:440: *** SIGABRT received at time=1725070750 on cpu 13 ***                                                                                                                                                                                                                                                                                             │
│ [2024-08-30 19:19:10,139 E 64 3526] logging.cc:440: PC: @     0x7fc4cf52700b  (unknown)  raise                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:19:10,139 E 64 3526] logging.cc:440:     @     0x7fc4cf527090       3216  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:19:10,140 E 64 3526] logging.cc:440:     @     0x7fc4ce2cc37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:19:10,140 E 64 3526] logging.cc:440:     @     0x7fc4ce2cc090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ Fatal Python error: Aborted

Back to version v0.5.4 again.

zoltan-fedor · 2024-08-31T02:25:09Z

Two minutes later the next error:

│     return self._call_impl(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                           │
│   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl                                                                                                                                                                                                                                                                                                 │
│     return forward_call(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                              │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 110, in forward                                                                                                                                                                                                                                                                                          │
│     self._init_sampling_tensors(logits, sampling_metadata)                                                                                                                                                                                                                                                                                                                                            │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 87, in _init_sampling_tensors                                                                                                                                                                                                                                                                            │
│     do_min_p) = SamplingTensors.from_sampling_metadata(                                                                                                                                                                                                                                                                                                                                               │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 520, in from_sampling_metadata                                                                                                                                                                                                                                                                        │
│     sampling_tensors = SamplingTensors.from_lists(                                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 564, in from_lists                                                                                                                                                                                                                                                                                    │
│     temperatures_t = torch.tensor(                                                                                                                                                                                                                                                                                                                                                                    │
│ RuntimeError: CUDA error: an illegal memory access was encountered                                                                                                                                                                                                                                                                                                                                    │
│ CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.                                                                                                                                                                                                                                                                               │
│ For debugging consider passing CUDA_LAUNCH_BLOCKING=1                                                                                                                                                                                                                                                                                                                                                 │
│ Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.                                                                                                                                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                    │
│   File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion                                                                                                                                                                                                                                                                                    │
│     raise AsyncEngineDeadError(                                                                                                                                                                                                                                                                                                                                                                       │
│ vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.                                                                                                                                                                                                                  │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.90.10:46306 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.91.181:55960 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.88.168:48038 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ [2024-08-30 19:24:28,686 E 65 3518] logging.cc:115: Stack trace:                                                                                                                                                                                                                                                                                                                                      │
│  /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10b96aa) [0x7f7b2e5c76aa] ray::operator<<()                                                                                                                                                                                                                                                                                                │
│ /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10bc932) [0x7f7b2e5ca932] ray::TerminateHandler()                                                                                                                                                                                                                                                                                           │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c) [0x7f7c7bb0a37c]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7) [0x7f7c7bb0a3e7]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa36f) [0x7f7c7bb0a36f]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xe5ab35) [0x7f7c2e01eb35] c10d::ProcessGroupNCCL::ncclCommWatchdog()                                                                                                                                                                                                                                                             │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7f7c7bb36df4]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f7c7cd07609] start_thread                                                                                                                                                                                                                                                                                                                      │
│ /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f7c7ce41353] __clone                                                                                                                                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ *** SIGABRT received at time=1725071068 on cpu 16 ***                                                                                                                                                                                                                                                                                                                                                 │
│ PC: @     0x7f7c7cd6500b  (unknown)  raise                                                                                                                                                                                                                                                                                                                                                            │
│     @     0x7f7c7cd65090       3216  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f7c7bb0a37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f7c7bb0a090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440: *** SIGABRT received at time=1725071068 on cpu 16 ***                                                                                                                                                                                                                                                                                             │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440: PC: @     0x7f7c7cd6500b  (unknown)  raise                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440:     @     0x7f7c7cd65090       3216  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440:     @     0x7f7c7bb0a37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:24:28,689 E 65 3518] logging.cc:440:     @     0x7f7c7bb0a090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ Fatal Python error: Aborted

This seems to be the same illegal memory address issue as #8025

ashgold · 2024-08-31T05:18:11Z

@zoltan-fedor Thanks for reporting the bug. Could you please use without --num-scheduler-steps 8? I think there were several bug fixes on it after v0.5.5.

Hi.

I had the exact same issue.

There is an obvious behavior that is causing this issue.
When I place a steady load, it doesn't matter how long the load is maintained, but if I stop the load in the middle of making a request and getting a response, it seems to cause this issue by dropping the connection.

Below are the options I started VLLM with.

    - args:
      - --model
      - /data/models/llama-3-1-70b-instruct/base
      - --tensor-parallel-size
      - "4"
      - --load-format
      - "auto"
      - --max-model-len
      - "16384"
      - --disable-log-requests
      - --uvicorn-log-level
      - "warning"
      - --gpu-memory-utilization
      - "0.9"
      - --enable-prefix-caching
      - --num-scheduler-steps
      - "8"

Below is the log when the bug occurred.

ERROR 08-30 22:11:29 async_llm_engine.py:65] Engine background task failed
ERROR 08-30 22:11:29 async_llm_engine.py:65] Traceback (most recent call last):
ERROR 08-30 22:11:29 async_llm_engine.py:65]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
ERROR 08-30 22:11:29 async_llm_engine.py:65]     return_value = task.result()
ERROR 08-30 22:11:29 async_llm_engine.py:65]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
ERROR 08-30 22:11:29 async_llm_engine.py:65]     result = task.result()
ERROR 08-30 22:11:29 async_llm_engine.py:65]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 873, in engine_step
ERROR 08-30 22:11:29 async_llm_engine.py:65]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-30 22:11:29 async_llm_engine.py:65]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 356, in step_async
ERROR 08-30 22:11:29 async_llm_engine.py:65]     request_outputs = self._process_model_outputs(
ERROR 08-30 22:11:29 async_llm_engine.py:65]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1232, in _process_model_outputs
ERROR 08-30 22:11:29 async_llm_engine.py:65]     self.output_processor.process_outputs(seq_group, outputs)
ERROR 08-30 22:11:29 async_llm_engine.py:65]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/multi_step.py", line 73, in process_outputs
ERROR 08-30 22:11:29 async_llm_engine.py:65]     assert seqs, "expected running sequences"
ERROR 08-30 22:11:29 async_llm_engine.py:65] AssertionError: expected running sequences
ERROR:asyncio:Exception in callback functools.partial(<function _log_task_completion at 0x7f00de437be0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f00c643a320>>)
handle: <Handle functools.partial(<function _log_task_completion at 0x7f00de437be0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f00c643a320>>)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop
    result = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 873, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 356, in step_async
    request_outputs = self._process_model_outputs(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1232, in _process_model_outputs
    self.output_processor.process_outputs(seq_group, outputs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/output_processor/multi_step.py", line 73, in process_outputs
    assert seqs, "expected running sequences"
AssertionError: expected running sequences

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR 08-30 22:11:29 client.py:265] Got Unhealthy response from RPC Server
ERROR 08-30 22:11:29 client.py:412] AsyncEngineDeadError('Background loop is stopped.')
ERROR 08-30 22:11:29 client.py:412] Traceback (most recent call last):
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate
ERROR 08-30 22:11:29 client.py:412]     await self.check_health(socket=socket)
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health
ERROR 08-30 22:11:29 client.py:412]     await self._send_one_way_rpc_request(
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request
ERROR 08-30 22:11:29 client.py:412]     raise response
ERROR 08-30 22:11:29 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.
ERROR 08-30 22:11:29 client.py:265] Got Unhealthy response from RPC Server
ERROR 08-30 22:11:29 client.py:412] AsyncEngineDeadError('Background loop is stopped.')
ERROR 08-30 22:11:29 client.py:412] Traceback (most recent call last):
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate
ERROR 08-30 22:11:29 client.py:412]     await self.check_health(socket=socket)
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health
ERROR 08-30 22:11:29 client.py:412]     await self._send_one_way_rpc_request(
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request
ERROR 08-30 22:11:29 client.py:412]     raise response
ERROR 08-30 22:11:29 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f55771d0ca0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 754, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 774, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 295, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
ERROR 08-30 22:11:29 client.py:265] Got Unhealthy response from RPC Server
ERROR 08-30 22:11:29 client.py:412] AsyncEngineDeadError('Background loop is stopped.')
ERROR 08-30 22:11:29 client.py:412] Traceback (most recent call last):
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate
ERROR 08-30 22:11:29 client.py:412]     await self.check_health(socket=socket)
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health
ERROR 08-30 22:11:29 client.py:412]     await self._send_one_way_rpc_request(
ERROR 08-30 22:11:29 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request
ERROR 08-30 22:11:29 client.py:412]     raise response
ERROR 08-30 22:11:29 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f557ebc4580

robertgshaw2-neuralmagic · 2024-08-31T15:39:34Z

The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:

[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528

@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:

the full logs
anything you can re: access patterns (the client code which generates the issue)

Would help us a lot of reproduce and resolve the issue

robertgshaw2-neuralmagic · 2024-08-31T16:11:10Z

Two minutes later the next error:

│     return self._call_impl(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                           │
│   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl                                                                                                                                                                                                                                                                                                 │
│     return forward_call(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                              │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 110, in forward                                                                                                                                                                                                                                                                                          │
│     self._init_sampling_tensors(logits, sampling_metadata)                                                                                                                                                                                                                                                                                                                                            │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 87, in _init_sampling_tensors                                                                                                                                                                                                                                                                            │
│     do_min_p) = SamplingTensors.from_sampling_metadata(                                                                                                                                                                                                                                                                                                                                               │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 520, in from_sampling_metadata                                                                                                                                                                                                                                                                        │
│     sampling_tensors = SamplingTensors.from_lists(                                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 564, in from_lists                                                                                                                                                                                                                                                                                    │
│     temperatures_t = torch.tensor(                                                                                                                                                                                                                                                                                                                                                                    │
│ RuntimeError: CUDA error: an illegal memory access was encountered                                                                                                                                                                                                                                                                                                                                    │
│ CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.                                                                                                                                                                                                                                                                               │
│ For debugging consider passing CUDA_LAUNCH_BLOCKING=1                                                                                                                                                                                                                                                                                                                                                 │
│ Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.                                                                                                                                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                    │
│   File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion                                                                                                                                                                                                                                                                                    │
│     raise AsyncEngineDeadError(                                                                                                                                                                                                                                                                                                                                                                       │
│ vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.                                                                                                                                                                                                                  │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.90.10:46306 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.91.181:55960 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.88.168:48038 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ [2024-08-30 19:24:28,686 E 65 3518] logging.cc:115: Stack trace:                                                                                                                                                                                                                                                                                                                                      │
│  /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10b96aa) [0x7f7b2e5c76aa] ray::operator<<()                                                                                                                                                                                                                                                                                                │
│ /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10bc932) [0x7f7b2e5ca932] ray::TerminateHandler()                                                                                                                                                                                                                                                                                           │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c) [0x7f7c7bb0a37c]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7) [0x7f7c7bb0a3e7]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa36f) [0x7f7c7bb0a36f]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xe5ab35) [0x7f7c2e01eb35] c10d::ProcessGroupNCCL::ncclCommWatchdog()                                                                                                                                                                                                                                                             │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7f7c7bb36df4]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f7c7cd07609] start_thread                                                                                                                                                                                                                                                                                                                      │
│ /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f7c7ce41353] __clone                                                                                                                                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ *** SIGABRT received at time=1725071068 on cpu 16 ***                                                                                                                                                                                                                                                                                                                                                 │
│ PC: @     0x7f7c7cd6500b  (unknown)  raise                                                                                                                                                                                                                                                                                                                                                            │
│     @     0x7f7c7cd65090       3216  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f7c7bb0a37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f7c7bb0a090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440: *** SIGABRT received at time=1725071068 on cpu 16 ***                                                                                                                                                                                                                                                                                             │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440: PC: @     0x7f7c7cd6500b  (unknown)  raise                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440:     @     0x7f7c7cd65090       3216  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440:     @     0x7f7c7bb0a37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:24:28,689 E 65 3518] logging.cc:440:     @     0x7f7c7bb0a090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ Fatal Python error: Aborted

This seems to be the same illegal memory address issue as #8025

Are these logs from v0.5.4 or v0.5.5?

Update: looks like v0.5.5 based on line numbers

robertgshaw2-neuralmagic · 2024-08-31T16:33:15Z

Two minutes later the next error:

│     return self._call_impl(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                           │
│   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl                                                                                                                                                                                                                                                                                                 │
│     return forward_call(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                              │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 110, in forward                                                                                                                                                                                                                                                                                          │
│     self._init_sampling_tensors(logits, sampling_metadata)                                                                                                                                                                                                                                                                                                                                            │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 87, in _init_sampling_tensors                                                                                                                                                                                                                                                                            │
│     do_min_p) = SamplingTensors.from_sampling_metadata(                                                                                                                                                                                                                                                                                                                                               │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 520, in from_sampling_metadata                                                                                                                                                                                                                                                                        │
│     sampling_tensors = SamplingTensors.from_lists(                                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/sampling_metadata.py", line 564, in from_lists                                                                                                                                                                                                                                                                                    │
│     temperatures_t = torch.tensor(                                                                                                                                                                                                                                                                                                                                                                    │
│ RuntimeError: CUDA error: an illegal memory access was encountered                                                                                                                                                                                                                                                                                                                                    │
│ CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.                                                                                                                                                                                                                                                                               │
│ For debugging consider passing CUDA_LAUNCH_BLOCKING=1                                                                                                                                                                                                                                                                                                                                                 │
│ Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.                                                                                                                                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                    │
│   File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion                                                                                                                                                                                                                                                                                    │
│     raise AsyncEngineDeadError(                                                                                                                                                                                                                                                                                                                                                                       │
│ vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.                                                                                                                                                                                                                  │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.90.10:46306 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:265] Got Unhealthy response from RPC Server                                                                                                                                                                                                                                                                                                                            │
│ ERROR 08-30 19:24:28 client.py:412] AsyncEngineDeadError('Background loop is stopped.')                                                                                                                                                                                                                                                                                                               │
│ ERROR 08-30 19:24:28 client.py:412] Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 409, in generate                                                                                                                                                                                                                                                     │
│ ERROR 08-30 19:24:28 client.py:412]     await self.check_health(socket=socket)                                                                                                                                                                                                                                                                                                                        │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health                                                                                                                                                                                                                                                 │
│ ERROR 08-30 19:24:28 client.py:412]     await self._send_one_way_rpc_request(                                                                                                                                                                                                                                                                                                                         │
│ ERROR 08-30 19:24:28 client.py:412]   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request                                                                                                                                                                                                                                    │
│ ERROR 08-30 19:24:28 client.py:412]     raise response                                                                                                                                                                                                                                                                                                                                                │
│ ERROR 08-30 19:24:28 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped.                                                                                                                                                                                                                                                                                    │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.91.181:55960 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ CRITICAL 08-30 19:24:28 launcher.py:82] AsyncLLMEngine has failed, terminating server process                                                                                                                                                                                                                                                                                                         │
│ INFO:     10.94.88.168:48038 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error                                                                                                                                                                                                                                                                                                              │
│ [2024-08-30 19:24:28,686 E 65 3518] logging.cc:115: Stack trace:                                                                                                                                                                                                                                                                                                                                      │
│  /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10b96aa) [0x7f7b2e5c76aa] ray::operator<<()                                                                                                                                                                                                                                                                                                │
│ /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10bc932) [0x7f7b2e5ca932] ray::TerminateHandler()                                                                                                                                                                                                                                                                                           │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c) [0x7f7c7bb0a37c]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7) [0x7f7c7bb0a3e7]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa36f) [0x7f7c7bb0a36f]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xe5ab35) [0x7f7c2e01eb35] c10d::ProcessGroupNCCL::ncclCommWatchdog()                                                                                                                                                                                                                                                             │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7f7c7bb36df4]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f7c7cd07609] start_thread                                                                                                                                                                                                                                                                                                                      │
│ /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f7c7ce41353] __clone                                                                                                                                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ *** SIGABRT received at time=1725071068 on cpu 16 ***                                                                                                                                                                                                                                                                                                                                                 │
│ PC: @     0x7f7c7cd6500b  (unknown)  raise                                                                                                                                                                                                                                                                                                                                                            │
│     @     0x7f7c7cd65090       3216  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f7c7bb0a37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f7c7bb0a090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440: *** SIGABRT received at time=1725071068 on cpu 16 ***                                                                                                                                                                                                                                                                                             │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440: PC: @     0x7f7c7cd6500b  (unknown)  raise                                                                                                                                                                                                                                                                                                        │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440:     @     0x7f7c7cd65090       3216  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:24:28,688 E 65 3518] logging.cc:440:     @     0x7f7c7bb0a37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-08-30 19:24:28,689 E 65 3518] logging.cc:440:     @     0x7f7c7bb0a090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ Fatal Python error: Aborted

This seems to be the same illegal memory address issue as #8025

Are these logs from v0.5.4 or v0.5.5?

Update: looks like v0.5.5 based on line numbers

Update: I am able to reproduce the issue sporadically

robertgshaw2-neuralmagic · 2024-08-31T17:45:47Z

Update on AssertionError: expected running sequences:

Fixed by: [BugFix][Core] Multistep Fix Crash on Request Cancellation #8059

robertgshaw2-neuralmagic · 2024-08-31T21:30:35Z

@zoltan-fedor - I reproed the issue once, but have not been able to retrigger with CUDA_LAUNCH_BLOCKING=1 after running for several hours. Will leave on overnight, but any data / request pattern you can share would help a lot

zoltan-fedor · 2024-08-31T21:56:27Z

@robertgshaw2-neuralmagic , sorry, we do not have a way to reproduce it either.

robertgshaw2-neuralmagic · 2024-08-31T21:57:38Z

@robertgshaw2-neuralmagic , sorry, we do not have a way to reproduce it either.

No worries. I am sure it will occur soon + I can look into it further.

robertgshaw2-neuralmagic · 2024-08-31T21:58:25Z

Is there anything more you can share about your environment?

E.g. can you run collect_env.py?

zoltan-fedor · 2024-08-31T22:08:58Z

There isn't much to share.
We are using your docker image from Dockerhub: https://hub.docker.com/r/vllm/vllm-openai/tags

No modification, we run it as-is.
At the top of this ticket you can see the parameters we use and the GPUs it is running on.

robertgshaw2-neuralmagic · 2024-08-31T22:13:03Z

There isn't much to share. We are using your docker image from Dockerhub: https://hub.docker.com/r/vllm/vllm-openai/tags

No modification, we run it as-is. At the top of this ticket you can see the parameters we use and the GPUs it is running on.

sounds good. thanks

robertgshaw2-neuralmagic · 2024-09-01T03:45:46Z

There isn't much to share. We are using your docker image from Dockerhub: https://hub.docker.com/r/vllm/vllm-openai/tags
No modification, we run it as-is. At the top of this ticket you can see the parameters we use and the GPUs it is running on.

sounds good. thanks

Indeed it reproduced. As expected, it is an illegal memory access in flash attention due to prefix caching. I will dig in further. took about 4 hours to trigger it

zoltan-fedor · 2024-09-03T12:46:45Z

The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue

@robertgshaw2-neuralmagic , I have also seen the same illegal memory access error with v0.5.4 WITHOUT the --enable-prefix-caching flag!

│     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)                                                                                                                                                                                                                                                                                                                          │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app                                                                                                                                                                                                                                                                                             │
│     raise exc                                                                                                                                                                                                                                                                                                                                                                                         │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app                                                                                                                                                                                                                                                                                             │
│     await app(scope, receive, sender)                                                                                                                                                                                                                                                                                                                                                                 │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__                                                                                                                                                                                                                                                                                                          │
│     await self.middleware_stack(scope, receive, send)                                                                                                                                                                                                                                                                                                                                                 │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app                                                                                                                                                                                                                                                                                                               │
│     await route.handle(scope, receive, send)                                                                                                                                                                                                                                                                                                                                                          │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle                                                                                                                                                                                                                                                                                                            │
│     await self.app(scope, receive, send)                                                                                                                                                                                                                                                                                                                                                              │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app                                                                                                                                                                                                                                                                                                                │
│     await wrap_app_handling_exceptions(app, request)(scope, receive, send)                                                                                                                                                                                                                                                                                                                            │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app                                                                                                                                                                                                                                                                                             │
│     raise exc                                                                                                                                                                                                                                                                                                                                                                                         │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app                                                                                                                                                                                                                                                                                             │
│     await app(scope, receive, sender)                                                                                                                                                                                                                                                                                                                                                                 │
│   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app                                                                                                                                                                                                                                                                                                                │
│     response = await func(request)                                                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app                                                                                                                                                                                                                                                                                                                 │
│     raw_response = await run_endpoint_function(                                                                                                                                                                                                                                                                                                                                                       │
│   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function                                                                                                                                                                                                                                                                                               │
│     return await dependant.call(**values)                                                                                                                                                                                                                                                                                                                                                             │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 204, in create_completion                                                                                                                                                                                                                                                                                │
│     generator = await openai_serving_completion.create_completion(                                                                                                                                                                                                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 170, in create_completion                                                                                                                                                                                                                                                                        │
│     async for i, res in result_generator:                                                                                                                                                                                                                                                                                                                                                             │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 346, in consumer                                                                                                                                                                                                                                                                                                                 │
│     raise e                                                                                                                                                                                                                                                                                                                                                                                           │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 337, in consumer                                                                                                                                                                                                                                                                                                                 │
│     raise item                                                                                                                                                                                                                                                                                                                                                                                        │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 312, in producer                                                                                                                                                                                                                                                                                                                 │
│     async for item in iterator:                                                                                                                                                                                                                                                                                                                                                                       │
│   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 216, in generate                                                                                                                                                                                                                                                                                         │
│     raise request_output                                                                                                                                                                                                                                                                                                                                                                              │
│ RuntimeError: CUDA error: an illegal memory access was encountered                                                                                                                                                                                                                                                                                                                                    │
│ CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.                                                                                                                                                                                                                                                                               │
│ For debugging consider passing CUDA_LAUNCH_BLOCKING=1                                                                                                                                                                                                                                                                                                                                                 │
│ Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.                                                                                                                                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ [2024-09-02 12:11:54,463 E 61 3464] logging.cc:115: Stack trace:                                                                                                                                                                                                                                                                                                                                      │
│  /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10b96aa) [0x7f3da67f26aa] ray::operator<<()                                                                                                                                                                                                                                                                                                │
│ /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x10bc932) [0x7f3da67f5932] ray::TerminateHandler()                                                                                                                                                                                                                                                                                           │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c) [0x7f3eecc6e37c]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7) [0x7f3eecc6e3e7]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa36f) [0x7f3eecc6e36f]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xe5ab35) [0x7f3e9f182b35] c10d::ProcessGroupNCCL::ncclCommWatchdog()                                                                                                                                                                                                                                                             │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7f3eecc9adf4]                                                                                                                                                                                                                                                                                                                                   │
│ /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f3eede5c609] start_thread                                                                                                                                                                                                                                                                                                                      │
│ /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f3eedf96353] __clone                                                                                                                                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                                                                                                                                                                                       │
│ *** SIGABRT received at time=1725279114 on cpu 19 ***                                                                                                                                                                                                                                                                                                                                                 │
│ PC: @     0x7f3eedeba00b  (unknown)  raise                                                                                                                                                                                                                                                                                                                                                            │
│     @     0x7f3eedeba090       3216  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f3eecc6e37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│     @     0x7f3eecc6e090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                                                                        │
│ [2024-09-02 12:11:54,465 E 61 3464] logging.cc:440: *** SIGABRT received at time=1725279114 on cpu 19 ***                                                                                                                                                                                                                                                                                             │
│ [2024-09-02 12:11:54,465 E 61 3464] logging.cc:440: PC: @     0x7f3eedeba00b  (unknown)  raise                                                                                                                                                                                                                                                                                                        │
│ [2024-09-02 12:11:54,465 E 61 3464] logging.cc:440:     @     0x7f3eedeba090       3216  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-09-02 12:11:54,465 E 61 3464] logging.cc:440:     @     0x7f3eecc6e37c  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ [2024-09-02 12:11:54,466 E 61 3464] logging.cc:440:     @     0x7f3eecc6e090  (unknown)  (unknown)                                                                                                                                                                                                                                                                                                    │
│ Fatal Python error: Aborted                                                                                                                                                                                                                                                                                                                                                                           │
│

robertgshaw2-neuralmagic · 2024-09-03T12:48:24Z

The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue
@robertgshaw2-neuralmagic , I have also seen the same error with v0.5.4 WITHOUT the --enable-prefix-caching flag!

So, with the following command?:

--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--dtype half
--max-model-len 32768

zoltan-fedor · 2024-09-03T12:50:19Z

  - "32768"

The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue
@robertgshaw2-neuralmagic , I have also seen the same error with v0.5.4 WITHOUT the --enable-prefix-caching flag!
So, with the following command?:
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--dtype half
--max-model-len 32768

That is correct.
That was the command, so no --enable-prefix-caching

robertgshaw2-neuralmagic · 2024-09-03T12:51:39Z

  - "32768"
The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue
@robertgshaw2-neuralmagic , I have also seen the same error with v0.5.4 WITHOUT the --enable-prefix-caching flag!
So, with the following command?:
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--dtype half
--max-model-len 32768
That is correct. That was the command, so no --enable-prefix-caching

Thanks. Will run this in the background today and see if I can reproduce. The illegal memory access I had before seemed to occur in attention, so perhaps it is something related to chunked-prefill rather than prefix caching (since chunked-prefill is on by default is max-len >32k)

liulisi16323 · 2024-09-05T07:58:59Z

  - "32768"
The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue
@robertgshaw2-neuralmagic , I have also seen the same error with v0.5.4 WITHOUT the --enable-prefix-caching flag!
So, with the following command?:
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--dtype half
--max-model-len 32768
That is correct. That was the command, so no --enable-prefix-caching
Thanks. Will run this in the background today and see if I can reproduce. The illegal memory access I had before seemed to occur in attention, so perhaps it is something related to chunked-prefill rather than prefix caching (since chunked-prefill is on by default is max-len >32k)

I've encountered this error too.
Error message:
Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered.
It's very likely to occur under high concurrency conditions. In my case, chunked-prefill is disable.

robertgshaw2-neuralmagic · 2024-09-05T09:21:45Z

  - "32768"
The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue
@robertgshaw2-neuralmagic , I have also seen the same error with v0.5.4 WITHOUT the --enable-prefix-caching flag!
So, with the following command?:
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--dtype half
--max-model-len 32768
That is correct. That was the command, so no --enable-prefix-caching
Thanks. Will run this in the background today and see if I can reproduce. The illegal memory access I had before seemed to occur in attention, so perhaps it is something related to chunked-prefill rather than prefix caching (since chunked-prefill is on by default is max-len >32k)
I've encountered this error too.
Error message:
Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered.
It's very likely to occur under high concurrency conditions. In my case, chunked-prefill is disable.

@liulisi16323 can you share your launch command? I ran @zoltan-fedor’s launch command for about 1 day (without prefix caching) and have not been able to trigger this issue

@zoltan-fedor - can you share driver and CUDA version? I will try to make an enc that more closely matches yours

zoltan-fedor · 2024-09-05T23:55:37Z

@robertgshaw2-neuralmagic

can you share driver and CUDA version? I will try to make an enc that more closely matches yours

Driver Version: 535.183.01 CUDA Version: 12.4

liulisi16323 · 2024-09-06T01:13:24Z

  - "32768"
The source of AssertionError: expected running sequences is due to abort not yet being supported with multi-step scheduling. multi-step scheduling is a new feature we are still working on - I would not yet recommend using multi-step in production use cases until the feature is finalized. The tracking issue for development of multi-step scheduling is here:
* [[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528](https://github.com/vllm-project/vllm/issues/7528)
@zoltan-fedor re: the issues you are seeing with illegal memory access on v0.5.4 / v0.5.5, we have seen intermittent reports of this with --enable-prefix-caching. We have been working on trying to reproduce the issue. If possible, sharing:
* the full logs

* anything you can re: access patterns (the client code which generates the issue)
Would help us a lot of reproduce and resolve the issue
@robertgshaw2-neuralmagic , I have also seen the same error with v0.5.4 WITHOUT the --enable-prefix-caching flag!
So, with the following command?:
--model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--enforce-eager
--trust-remote-code
--worker-use-ray
--dtype half
--max-model-len 32768
That is correct. That was the command, so no --enable-prefix-caching
Thanks. Will run this in the background today and see if I can reproduce. The illegal memory access I had before seemed to occur in attention, so perhaps it is something related to chunked-prefill rather than prefix caching (since chunked-prefill is on by default is max-len >32k)
I've encountered this error too.
Error message:
Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered.
It's very likely to occur under high concurrency conditions. In my case, chunked-prefill is disable.
@liulisi16323 can you share your launch command? I ran @zoltan-fedor’s launch command for about 1 day (without prefix caching) and have not been able to trigger this issue

@zoltan-fedor - can you share driver and CUDA version? I will try to make an enc that more closely matches yours

NVIDIA A800, Driver Version: 525.105.17，docker images: vllm/vllm-openai:v0.6.0, launch command: --model /Qwen2-72B-Instruct-GPTQ-Int4 --served-model-name qwen --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.65 --swap-space 0 --tensor-parallel-size 2 --enable-prefix-caching
max-model-len default 32k

ashgold · 2024-09-13T14:59:54Z

problem solved in v0.6.1.post1.

zoltan-fedor · 2024-09-13T18:43:13Z

Thanks @ashgold, I have upgraded to this latest version and will monitor whether the issue arises again.

yaronr · 2024-09-15T11:24:10Z

I just encountered the same issue (I think) on 0.6.1.post2. I only see the log output below, no stack trace.

 vLLM ZMQ RPC Server was interrupted.
INFO 09-15 04:17:46 async_llm_engine.py:55] Engine is gracefully shutting down.
ERROR 09-15 04:17:49 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 215 died, exit code: -15
INFO 09-15 04:17:49 multiproc_worker_utils.py:123] Killing local vLLM worker processes
[root@llmatrix-nvda-5f57e4b2-629d-4b74-8a56-f1678de39a46 multicloud]#

TangJiakai · 2024-09-16T11:43:44Z

problem solved in v0.6.1.post1.

No, I still get the error CUDA error: an illegal memory access was encountered

ashgold · 2024-09-16T12:15:31Z

problem solved in v0.6.1.post1.

No, I still get the error CUDA error: an illegal memory access was encountered

Can you share the details of the test environment and issues when they occur? If possible, I would like to reproduce it.
I conducted a long run test for more than 48 hours, but no issues occurred.

TangJiakai · 2024-09-16T14:35:35Z

@ashgold
I was executing requests concurrently on more than 3 GPU cards. It started off fine, but soon began to throw errors:

Exception in callback _log_task_completion(error_callback=<bound method...7fc9b858ee10>>)(<Task finishe...sertions.\n')>) at /data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py:38
handle: <Handle _log_task_completion(error_callback=<bound method...7fc9b858ee10>>)(<Task finishe...sertions.\n')>) at /data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py:38>
Traceback (most recent call last):
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 112, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1579, in execute_model
    logits = self.model.compute_logits(hidden_or_intermediate_states,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", line 457, in compute_logits
    logits = self.logits_processor(self.lm_head, hidden_states,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/lora/layers.py", line 1211, in forward
    return type(self.base_layer).forward(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/model_executor/layers/logits_processor.py", line 72, in forward
    logits = _apply_logits_processors(logits, sampling_metadata)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/model_executor/layers/logits_processor.py", line 142, in _apply_logits_processors
    logits_row = logits_processor(past_tokens_ids,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/lmformatenforcer/integrations/vllm.py", line 29, in __call__
    self.mask[allowed_tokens] = 0
    ~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

TangJiakai · 2024-09-16T14:37:21Z

At one point, the LLM on a certain card crashed and exited directly, and at the same time, the requests I sent to other LLMs also became ineffective. It's very strange.

ashgold · 2024-09-23T00:43:07Z

@ashgold I was executing requests concurrently on more than 3 GPU cards. It started off fine, but soon began to throw errors:

Exception in callback _log_task_completion(error_callback=<bound method...7fc9b858ee10>>)(<Task finishe...sertions.\n')>) at /data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py:38
handle: <Handle _log_task_completion(error_callback=<bound method...7fc9b858ee10>>)(<Task finishe...sertions.\n')>) at /data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py:38>
Traceback (most recent call last):
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 112, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1579, in execute_model
    logits = self.model.compute_logits(hidden_or_intermediate_states,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", line 457, in compute_logits
    logits = self.logits_processor(self.lm_head, hidden_states,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/lora/layers.py", line 1211, in forward
    return type(self.base_layer).forward(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/model_executor/layers/logits_processor.py", line 72, in forward
    logits = _apply_logits_processors(logits, sampling_metadata)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/vllm/model_executor/layers/logits_processor.py", line 142, in _apply_logits_processors
    logits_row = logits_processor(past_tokens_ids,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/lmformatenforcer/integrations/vllm.py", line 29, in __call__
    self.mask[allowed_tokens] = 0
    ~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

This seems to be a different issue to the one I'm experiencing, and I'd suggest opening a separate bug to follow up on it.

liulisi16323 · 2024-09-24T07:58:10Z

v0.6.1.post2, I still get the error under high concurrency conditions
[rank1]:[E924 15:10:22.264419715 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f620d546f86 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f620d4f5d10 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f620d621f08 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f620e83e3e6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f620e843600 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f620e84a2ba in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f620e84c6fc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xd6df4 (0x7f625bff0df4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x8609 (0x7f625d210609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f625d34a353 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f620d546f86 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f620d4f5d10 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f620d621f08 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f620e83e3e6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f620e843600 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f620e84a2ba in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f620e84c6fc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xd6df4 (0x7f625bff0df4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x8609 (0x7f625d210609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f625d34a353 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1521 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f620d546f86 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: + 0xe5aa84 (0x7f620e4d5a84 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0xd6df4 (0x7f625bff0df4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: + 0x8609 (0x7f625d210609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7f625d34a353 in /usr/lib/x86_64-linux-gnu/libc.so.6)`

zoltan-fedor added the bug Something isn't working label Aug 30, 2024

robertgshaw2-neuralmagic mentioned this issue Aug 31, 2024

[BugFix][Core] Multistep Fix Crash on Request Cancellation #8059

Merged

robertgshaw2-neuralmagic closed this as completed in #8059 Aug 31, 2024

robertgshaw2-neuralmagic reopened this Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

zoltan-fedor commented Aug 30, 2024 •

edited

Loading

WoosukKwon commented Aug 30, 2024

zoltan-fedor commented Aug 31, 2024 •

edited

Loading

zoltan-fedor commented Aug 31, 2024 •

edited

Loading

ashgold commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

zoltan-fedor commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

zoltan-fedor commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

robertgshaw2-neuralmagic commented Sep 1, 2024 •

edited

Loading

zoltan-fedor commented Sep 3, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Sep 3, 2024

zoltan-fedor commented Sep 3, 2024

robertgshaw2-neuralmagic commented Sep 3, 2024 •

edited

Loading

liulisi16323 commented Sep 5, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Sep 5, 2024

zoltan-fedor commented Sep 5, 2024 •

edited

Loading

liulisi16323 commented Sep 6, 2024

ashgold commented Sep 13, 2024

zoltan-fedor commented Sep 13, 2024

yaronr commented Sep 15, 2024

TangJiakai commented Sep 16, 2024

ashgold commented Sep 16, 2024

TangJiakai commented Sep 16, 2024

TangJiakai commented Sep 16, 2024

ashgold commented Sep 23, 2024

liulisi16323 commented Sep 24, 2024

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

Comments

zoltan-fedor commented Aug 30, 2024 • edited Loading

Your current environment

🐛 Describe the bug

Before submitting a new issue...

WoosukKwon commented Aug 30, 2024

zoltan-fedor commented Aug 31, 2024 • edited Loading

zoltan-fedor commented Aug 31, 2024 • edited Loading

ashgold commented Aug 31, 2024 • edited Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 • edited Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 • edited Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 • edited Loading

robertgshaw2-neuralmagic commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

zoltan-fedor commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

zoltan-fedor commented Aug 31, 2024

robertgshaw2-neuralmagic commented Aug 31, 2024

robertgshaw2-neuralmagic commented Sep 1, 2024 • edited Loading

zoltan-fedor commented Sep 3, 2024 • edited Loading

robertgshaw2-neuralmagic commented Sep 3, 2024

zoltan-fedor commented Sep 3, 2024

robertgshaw2-neuralmagic commented Sep 3, 2024 • edited Loading

liulisi16323 commented Sep 5, 2024 • edited Loading

robertgshaw2-neuralmagic commented Sep 5, 2024

zoltan-fedor commented Sep 5, 2024 • edited Loading

liulisi16323 commented Sep 6, 2024

ashgold commented Sep 13, 2024

zoltan-fedor commented Sep 13, 2024

yaronr commented Sep 15, 2024

TangJiakai commented Sep 16, 2024

ashgold commented Sep 16, 2024

TangJiakai commented Sep 16, 2024

TangJiakai commented Sep 16, 2024

ashgold commented Sep 23, 2024

liulisi16323 commented Sep 24, 2024

zoltan-fedor commented Aug 30, 2024 •

edited

Loading

zoltan-fedor commented Aug 31, 2024 •

edited

Loading

zoltan-fedor commented Aug 31, 2024 •

edited

Loading

ashgold commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 31, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Sep 1, 2024 •

edited

Loading

zoltan-fedor commented Sep 3, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Sep 3, 2024 •

edited

Loading

liulisi16323 commented Sep 5, 2024 •

edited

Loading

zoltan-fedor commented Sep 5, 2024 •

edited

Loading