-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU ROCm6 issues, and workarounds #2794
Comments
@BKitor Have you found any solution for distributed inference? Thank you very much in advance! Best regards, Shuyue |
Sorry, haven't poked this in a while (lost access to multi-node system). |
@BKitor Benjamin, I am using single-node multi-GPUs but there is a problem regarding the Do you have any idea how to solve it? Thank you very much, and have a nice day!
Best regards, Shuyue |
What I'm suggesting is to not use ray. |
@BKitor Benjamin, it seems that there is no May I know which vLLM version you are using? Thank you very much, and have a nice day! Best regards, Shuyue |
The file you're looking for is args_util.py, and it's present in 0.4.3
https://github.com/vllm-project/vllm/blob/1197e02141df1a7442f21ff6922c98ec0bba153e/vllm/engine/arg_utils.py#L38
…On Mon, Jun 10, 2024 at 2:36 PM Shuyue Jia ***@***.***> wrote:
What I'm suggesting is to not use ray. One of the arguments when
instantiating a model is distributed_execution_backend, where the options
include 'ray' or 'mp'. I'm not sure how you're launching your model, you
might have to insert distrubted_execution_backend="mp" where you create
the llm. i.e. from vllm import LLM; llm = LLM(<whatever your args already
are>, distrubted_execution_backend="mp). Otherwise, some of the provided
helper scripts let you specify --distributed-execution-backend on the
command line, but this isn't universal to YMYV.
@BKitor <https://github.com/BKitor> Benjamin, it seems that there is no distrubted_execution_backend
argument in LLM:
https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py.
However, there is one in the AsyncLLMEngine:
https://github.com/vllm-project/vllm/blob/main/vllm/engine/async_llm_engine.py,
which is for serving.
May I know which vLLM version you are using?
Thank you very much, and have a nice day!
Best regards,
Shuyue
June 10th, 2024
—
Reply to this email directly, view it on GitHub
<#2794 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEAZJUXNV36SANY7YRYL5OTZGYL4FAVCNFSM6AAAAABC4X7FNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGMZDIMRZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thank you very much, Benjamin! It really helps. Now, my multi-gpu inference can be running smoothly. For other researchers' reference: I use vLLM 0.4.3: https://github.com/vllm-project/vllm/releases/tag/v0.4.3 llm = LLM(
model=save_dir,
tokenizer=model_name,
dtype='bfloat16',
distributed_executor_backend="mp",
tensor_parallel_size=num_gpus_vllm,
gpu_memory_utilization=gpu_utilization_vllm,
enable_lora=False,
)
sampling_params = SamplingParams(
temperature=0,
top_p=1,
max_tokens=max_new_tokens,
stop=stop_tokens
)
completions = llm.generate(
prompts,
sampling_params,
) @BKitor However, the GPU memory cannot be released, except the first-initialized GPU ( import gc
import torch
from vllm.distributed.parallel_state import destroy_model_parallel
# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.model_executor.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
print("Successfully delete the llm pipeline and free the GPU memory.") Do you have suggestions on releasing all the GPUs' memory? Thank you very much, and have a nice day! Best regards, Shuyue |
This issue should be closed as the current main branch supports multi-gpu on ROCm 6.1x. |
I ran into a series of issues trying to get VLLM stood up on a system with multiple MI210s. I figured I'd document my issues and workarounds so that someone could pick up the baton later, or at least save someone some debugging time later.
torch.cuda.set_device()
with anything other than 0 would fail. I tweaked worker.py to always use 0, but I don't think this is a viable long-term fix.The text was updated successfully, but these errors were encountered: