-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove vllm dependency when using ray to run vllm #1637
Remove vllm dependency when using ray to run vllm #1637
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1637 +/- ##
==========================================
- Coverage 80.33% 80.04% -0.30%
==========================================
Files 95 116 +21
Lines 6602 8230 +1628
==========================================
+ Hits 5304 6588 +1284
- Misses 1298 1642 +344 ☔ View full report in Codecov by Sentry. |
c32f2c0
to
7c94047
Compare
superduperdb/ext/llm/vllm.py
Outdated
if not ray.is_initialized(): | ||
ray.init(address=self.ray_address, runtime_env=runtime_env) | ||
|
||
LLM = ray.remote(LLM).remote | ||
if self.vllm_kwargs.get('tensor_parallel_size') == 1: | ||
# must set num_gpus to 1 to avoid error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be an assertion during config level so that user knows that num_gpus should be assigned to 1
else User might be in false hope of
num_gpus = 4 and tensor_parallel_size = 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m still a bit unclear about this. Are you suggesting that we should inform users about this behavior? Should this information be communicated through documentation or some specific configuration settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert ray_config.get('num_gpus') == 1
when self.vllm_kwargs.get('tensor_parallel_size') == 1
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! something like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once done we can merge this pr :) @jieguangzhou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I changed to printing a warning with a description and helping the user set it to 1.
The reason for not using assertions directly is because users will still reset num_gpus, which can increase fault tolerance.
@kartik4949
superduperdb/ext/llm/vllm.py
Outdated
self.ray_config["num_gpus"] = 1 | ||
LLM = ray.remote(**self.ray_config)(_VLLMCore).remote | ||
else: | ||
# Don't know why using config will block the process, need to figure out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this when 'tensor_parallel_size'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When tensor_parallel_size is greater than one, the built-in ray integration of vllm is used, so the task is run directly on the ray cluster, and the ray configuration is managed by vllm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be regarded as how many GPUs are used to share this model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current manual configuration of Ray might lead to conflicts with the configuration of vLLM. When I have time, I will continue to examine the vLLM Ray-related code. The usage method on the vLLM official website involves starting a Ray cluster locally or connecting as a worker. They do not utilize the ray_address parameter; using this parameter can lead to a deadlock bug.
Therefore, I made some adaptations on the non-vllm side to make it compatible with remote multi-card and single-card
Issues related to:
2f546e6
to
672407b
Compare
Description
Related Issues
Checklist
make unit-testing
andmake integration-testing
successfully?Additional Notes or Comments