-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Running on a single machine with multiple GPUs error #9875
Comments
vLLM has its own multiprocessing setup for TP/PP. You should avoid using |
cc @youkaichao |
yeah we don't support |
@Wiselnn570 contribution welcome! |
see https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model for how to use litellm |
@youkaichao @DarkLight1337 Sure, I'm glad to contribute to this community when I have time! One more question, recently, I encountered an issue while modifying the positional encoding in the mrope_input_positions section of the Qwen2-VL code, and I try but don't know how to resolve it. In short, I'm aiming to explore the model's performance when extrapolating to a 60k context on the Qwen2-VL 7B model, using video data for testing. I tried replacing this section ( vllm/vllm/worker/model_runner.py Line 672 in 3bb4bef
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && in
dex < sizes[i] && "index out of bounds"` failed.
...
INFO 11-03 18:58:15 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241103-185815.pkl...
WARNING 11-03 18:58:15 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: device-side assert triggered
WARNING 11-03 18:58:15 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 11-03 18:58:15 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 11-03 18:58:15 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
WARNING 11-03 18:58:15 model_runner_base.py:143]
RuntimeError: Error in model execution: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. I have already tested the original M-RoPE, which outputs correctly with a 60k context, and the maximum mrope_input_positions value is around 300. So, I am wondering if the position value is too large, causing it to exceed the index. How should I modify it to support vanilla-RoPE (Or perhaps some other 3D positional encoding, where the positional encoding values are quite large.) for evaluation? Thanks! p.s. I noticed that this function ( vllm/vllm/worker/model_runner.py Line 637 in 3bb4bef
|
Your current environment
Name: vllm
Version: 0.6.3.post2.dev171+g890ca360
Model Input Dumps
No response
🐛 Describe the bug
I used the interface from this vllm repository to load the model and ran eval scripts on vlmevalkit(https://github.com/open-compass/VLMEvalKit)
for evaluation, but I got the error
Could you please advise on how to resolve this?
Here is the interface
vllm/vllm/distributed/parallel_state.py
Line 1025 in 3ea2dc2
Seems that the error occur at this assertion, so what change should I make to fit the assertion, thanks.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: