[Bug]: Running on a single machine with multiple GPUs error #9875

Wiselnn570 · 2024-10-31T08:26:23Z

Your current environment

Name: vllm
Version: 0.6.3.post2.dev171+g890ca360

Model Input Dumps

No response

🐛 Describe the bug

I used the interface from this vllm repository to load the model and ran eval scripts on vlmevalkit(https://github.com/open-compass/VLMEvalKit)

torchrun --nproc-per-node=8 run.py --data Video-MME --model Qwen2_VL-M-RoPE-80k

for evaluation, but I got the error

RuntimeError: world_size (8) is not equal to tensor_model_parallel_size (1) x pipeline_model_parallel_size (1).

Could you please advise on how to resolve this?
Here is the interface

from vllm import LLM
llm = LLM("/mnt/hwfile/mllm/weixilin/cache/Qwen2-VL-7B-Instruct", 
            max_model_len=100000,
            limit_mm_per_prompt={"video": 10},
            )

vllm/vllm/distributed/parallel_state.py

Line 1025 in 3ea2dc2

raise RuntimeError(

Seems that the error occur at this assertion, so what change should I make to fit the assertion, thanks.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-10-31T14:18:13Z

vLLM has its own multiprocessing setup for TP/PP. You should avoid using torchrun with vLLM.

DarkLight1337 · 2024-10-31T14:18:38Z

cc @youkaichao

youkaichao · 2024-11-01T17:27:53Z

yeah we don't support torchrun , but it would be good to provide some scripts to run multiple vllm instances with a proxy erver using litellm .

youkaichao · 2024-11-01T17:28:07Z

@Wiselnn570 contribution welcome!

youkaichao · 2024-11-01T17:28:23Z

see https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model for how to use litellm

Wiselnn570 · 2024-11-03T11:45:12Z

@youkaichao @DarkLight1337 Sure, I'm glad to contribute to this community when I have time! One more question, recently, I encountered an issue while modifying the positional encoding in the mrope_input_positions section of the Qwen2-VL code, and I try but don't know how to resolve it. In short, I'm aiming to explore the model's performance when extrapolating to a 60k context on the Qwen2-VL 7B model, using video data for testing. I tried replacing this section (

vllm/vllm/worker/model_runner.py

Line 672 in 3bb4bef

MRotaryEmbedding.get_input_positions(

) with vanilla-ROPE（That is, placing image, video, and text tokens all on the main diagonal of the M-RoPE.）, which caused the max value of the mrope_input_positions up to approximately 59k, but it eventually led to an error.

../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
...
INFO 11-03 18:58:15 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241103-185815.pkl...
WARNING 11-03 18:58:15 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: device-side assert triggered
WARNING 11-03 18:58:15 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 11-03 18:58:15 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 11-03 18:58:15 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
WARNING 11-03 18:58:15 model_runner_base.py:143] 
RuntimeError: Error in model execution: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I have already tested the original M-RoPE, which outputs correctly with a 60k context, and the maximum mrope_input_positions value is around 300. So, I am wondering if the position value is too large, causing it to exceed the index. How should I modify it to support vanilla-RoPE (Or perhaps some other 3D positional encoding, where the positional encoding values are quite large.) for evaluation? Thanks!

p.s. I noticed that this function (

vllm/vllm/worker/model_runner.py

Line 637 in 3bb4bef

def _compute_multi_modal_input(self, inter_data: InterDataForSeqGroup,

) was called several times before inferring on my provided video test data, and I’m wondering if this might be related.

Wiselnn570 added the bug Something isn't working label Oct 31, 2024

Wiselnn570 mentioned this issue Nov 3, 2024

../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && in dex < sizes[i] && "index out of bounds" failed. #9965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Running on a single machine with multiple GPUs error #9875

[Bug]: Running on a single machine with multiple GPUs error #9875

Wiselnn570 commented Oct 31, 2024 •

edited

Loading

DarkLight1337 commented Oct 31, 2024 •

edited

Loading

DarkLight1337 commented Oct 31, 2024

youkaichao commented Nov 1, 2024

youkaichao commented Nov 1, 2024

youkaichao commented Nov 1, 2024

Wiselnn570 commented Nov 3, 2024

[Bug]: Running on a single machine with multiple GPUs error #9875

[Bug]: Running on a single machine with multiple GPUs error #9875

Comments

Wiselnn570 commented Oct 31, 2024 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Oct 31, 2024 • edited Loading

DarkLight1337 commented Oct 31, 2024

youkaichao commented Nov 1, 2024

youkaichao commented Nov 1, 2024

youkaichao commented Nov 1, 2024

Wiselnn570 commented Nov 3, 2024

Wiselnn570 commented Oct 31, 2024 •

edited

Loading

DarkLight1337 commented Oct 31, 2024 •

edited

Loading