You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO 11-16 10:37:50 metrics.py:449] Avg prompt throughput: 5941.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 13 reqs, GPU KV cache usage: 1.5%, CPU KV cache usage: 0.0%.
INFO 11-16 10:37:50 metrics.py:465] Prefix cache hit rate: GPU: 94.87%, CPU: 0.00%
INFO: ::1:59242 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 11-16 10:37:53 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241116-103753.pkl...
WARNING 11-16 10:37:53 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 11-16 10:37:53 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 11-16 10:37:53 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 11-16 10:37:53 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
WARNING 11-16 10:37:53 model_runner_base.py:143]
CRITICAL 11-16 10:37:53 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: ::1:59242 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 11-16 10:37:53 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: ::1:59468 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 11-16 10:37:53 engine.py:135] RuntimeError('Error in model execution: CUDA error: an illegal memory access was encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n')
ERROR 11-16 10:37:53 engine.py:135] Traceback (most recent call last):
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 11-16 10:37:53 engine.py:135] return func(*args, **kwargs)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1687, in execute_model
ERROR 11-16 10:37:53 engine.py:135] logits = self.model.compute_logits(hidden_or_intermediate_states,
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 478, in compute_logits
ERROR 11-16 10:37:53 engine.py:135] logits = self.logits_processor(self.lm_head, hidden_states,
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-16 10:37:53 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-16 10:37:53 engine.py:135] return forward_call(*args, **kwargs)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/layers/logits_processor.py", line 74, in forward
ERROR 11-16 10:37:53 engine.py:135] logits = _apply_logits_processors(logits, sampling_metadata)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/layers/logits_processor.py", line 150, in _apply_logits_processors
ERROR 11-16 10:37:53 engine.py:135] logits_row = logits_processor(past_tokens_ids,
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 87, in __call__
ERROR 11-16 10:37:53 engine.py:135] allowed_tokens = torch.tensor(allowed_tokens, device=scores.device)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] RuntimeError: CUDA error: an illegal memory access was encountered
ERROR 11-16 10:37:53 engine.py:135] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-16 10:37:53 engine.py:135] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-16 10:37:53 engine.py:135] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 11-16 10:37:53 engine.py:135]
ERROR 11-16 10:37:53 engine.py:135]
ERROR 11-16 10:37:53 engine.py:135] The above exception was the direct cause of the following exception:
ERROR 11-16 10:37:53 engine.py:135]
ERROR 11-16 10:37:53 engine.py:135] Traceback (most recent call last):
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 11-16 10:37:53 engine.py:135] self.run_engine_loop()
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop
ERROR 11-16 10:37:53 engine.py:135] request_outputs = self.engine_step()
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 214, in engine_step
ERROR 11-16 10:37:53 engine.py:135] raise e
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 205, in engine_step
ERROR 11-16 10:37:53 engine.py:135] returnself.engine.step()
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 1454, in step
ERROR 11-16 10:37:53 engine.py:135] outputs = self.model_executor.execute_model(
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/executor/gpu_executor.py", line 125, in execute_model
ERROR 11-16 10:37:53 engine.py:135] output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 343, in execute_model
ERROR 11-16 10:37:53 engine.py:135] output = self.model_runner.execute_model(
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-16 10:37:53 engine.py:135] return func(*args, **kwargs)
ERROR 11-16 10:37:53 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-16 10:37:53 engine.py:135] File "/opt/conda/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 146, in _wrapper
ERROR 11-16 10:37:53 engine.py:135] raise type(err)(f"Error in model execution: "
ERROR 11-16 10:37:53 engine.py:135] RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
ERROR 11-16 10:37:53 engine.py:135] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-16 10:37:53 engine.py:135] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-16 10:37:53 engine.py:135] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 11-16 10:37:53 engine.py:135]
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [618]
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
Model Input Dumps
err_execute_model_input_20241116-081810.zip
🐛 Describe the bug
command
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: