-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024
Conversation
Can we fix #4029 by supporting choose |
@youkaichao @zhuohan123 any chance we could get #3466 merged first? I have been rebasing it for a couple of months now.
|
def update_environment_variables(self, envs: Dict[str, str]) -> None: | ||
key = 'CUDA_VISIBLE_DEVICES' | ||
if key in envs and key in os.environ: | ||
# overwriting CUDA_VISIBLE_DEVICES is desired behavior | ||
# suppress the warning in `update_environment_variables` | ||
del os.environ[key] | ||
update_environment_variables(envs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high-level, we should discourage passing configuration / data via environment variables because it allows for a lot of complexity that isn't subject to documentation / visibility / common abstractions that a config object would be subject to. Basically, we don't want to give the ability for external contributors to add additional configuration space very easily. An example of this is the Ray DAG vLLM integration, which is very hidden because it's not in a config object.
vllm/vllm/executor/ray_gpu_executor.py
Lines 24 to 27 in e95cd87
# If the env var is set, it uses the Ray's compiled DAG API | |
# which optimizes the control plane overhead. | |
# Run vLLM with VLLM_USE_RAY_COMPILED_DAG=1 to enable it. | |
USE_RAY_COMPILED_DAG = bool(os.getenv("VLLM_USE_RAY_COMPILED_DAG", 0)) |
Another example is that CUDA_VISIBLE_DEVICES
semantics may only work for a subset of backends; the way each backend configures which devices to use can be different. Then we have a cambrian explosion of env vars to manage devices..
Alternatives:
- Allow a whitelist of env vars that we're confident apply for all vLLM backends, e.g. MPI's
RANK
/LOCAL_RANK
/WORLD_SIZE
/etc - Take standard vLLM config and allow each backend to interpret / apply it (e.g.
set_visible_devices
API instead ofupdate_environment_variables
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly use environment variable, because it is not coupled with vllm internal data structure. see #3904 for example. It is quite common in distributed setting to set these common environment variables and global data structure. They are meat to be global. It is quite a pain if we want to pass then from function arguments, because each modification will need to pass them from the very top function to the very bottom function, and is unnecessarily tedious. See https://learningsystems.slack.com/archives/C05AGDSRXU5/p1712893317174879?thread_ts=1712890806.922209&cid=C05AGDSRXU5 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree! and I won't block the PR on it -- but to raise vLLM quality as we have more backends, we should be careful with global state https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil
OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.
# if the driver worker also execute methods, | ||
# exceptions in the rest worker may cause deadlock in rpc like ray | ||
# see https://github.com/vllm-project/vllm/issues/3455 | ||
# print the error and inform the user to solve the error | ||
msg = (f"Error executing method {method}. " | ||
"This might cause deadlock in distributed execution.") | ||
logger.exception(msg) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we have another example of this happening, this should live in the Ray-specific worker wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in general the case for rpc framework, where we have the driver worker both running models and check health/get return result from other workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK makes sense!
vllm/utils.py
Outdated
for k, v in envs.items(): | ||
if k in os.environ: | ||
logger.warning(f"Overwriting environment variable {k} " | ||
f"from {os.environ[k]} to {v}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do {os.environ[k]=}
and {v=}
so whitespace is obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add single quotes to make it clear. {os.environ[k]=}
will print os.environ[k]
to users, which is confusing and not informative.
@@ -81,3 +87,54 @@ def remove_lora(self, lora_id: int) -> bool: | |||
|
|||
def list_loras(self) -> List[int]: | |||
raise ValueError(f"{type(self)} does not support LoRA") | |||
|
|||
|
|||
class WorkerWrapperBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should strive for composition over inheritance to limit the coupling of the implementations to the interface; otherwise we invite undesired coupling which makes some changes difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Can you elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thinking goes like this:
- The purpose of an interface is to minimally couple different implementations.
- Different implementations may have common logic, and this common logic can be put in the abstract class
- It is good to avoid this because it violates (1); the coupling goes beyond the minimal interface definition
- The alternative to inheritance is to have utility classes / functions and compose their usage within the different implementations. example
It is not a hard rule as it's impossible to completely decouple different components but a good design principle.
I'll still approve this PR without it; however the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier.
del os.environ[key] | ||
update_environment_variables(envs) | ||
|
||
def init_worker(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some docstrings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 21be004.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment, docstrings look like this (this degree of formality is overkill for this method..)
Python doc tooling will then take the first statement (if it's a string) and expose it for auto doc generation
example:
vllm/vllm/core/block/cpu_gpu_block_allocator.py
Lines 155 to 167 in d150e4f
def fork(self, last_block: Block) -> List[Block]: | |
"""Creates a new sequence of blocks that shares the same underlying | |
memory as the original sequence. | |
Args: | |
last_block (Block): The last block in the original sequence. | |
Returns: | |
List[Block]: A new list of blocks that shares the same memory as the | |
original sequence. | |
""" | |
allocator = self._block_ids_to_allocator[last_block.block_id] | |
return allocator.fork(last_block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits and suggestions, otherwise LGTM
# if the driver worker also execute methods, | ||
# exceptions in the rest worker may cause deadlock in rpc like ray | ||
# see https://github.com/vllm-project/vllm/issues/3455 | ||
# print the error and inform the user to solve the error | ||
msg = (f"Error executing method {method}. " | ||
"This might cause deadlock in distributed execution.") | ||
logger.exception(msg) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK makes sense!
del os.environ[key] | ||
update_environment_variables(envs) | ||
|
||
def init_worker(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment, docstrings look like this (this degree of formality is overkill for this method..)
Python doc tooling will then take the first statement (if it's a string) and expose it for auto doc generation
example:
vllm/vllm/core/block/cpu_gpu_block_allocator.py
Lines 155 to 167 in d150e4f
def fork(self, last_block: Block) -> List[Block]: | |
"""Creates a new sequence of blocks that shares the same underlying | |
memory as the original sequence. | |
Args: | |
last_block (Block): The last block in the original sequence. | |
Returns: | |
List[Block]: A new list of blocks that shares the same memory as the | |
original sequence. | |
""" | |
allocator = self._block_ids_to_allocator[last_block.block_id] | |
return allocator.fork(last_block) |
@@ -81,3 +87,54 @@ def remove_lora(self, lora_id: int) -> bool: | |||
|
|||
def list_loras(self) -> List[int]: | |||
raise ValueError(f"{type(self)} does not support LoRA") | |||
|
|||
|
|||
class WorkerWrapperBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thinking goes like this:
- The purpose of an interface is to minimally couple different implementations.
- Different implementations may have common logic, and this common logic can be put in the abstract class
- It is good to avoid this because it violates (1); the coupling goes beyond the minimal interface definition
- The alternative to inheritance is to have utility classes / functions and compose their usage within the different implementations. example
It is not a hard rule as it's impossible to completely decouple different components but a good design principle.
I'll still approve this PR without it; however the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier.
def update_environment_variables(self, envs: Dict[str, str]) -> None: | ||
key = 'CUDA_VISIBLE_DEVICES' | ||
if key in envs and key in os.environ: | ||
# overwriting CUDA_VISIBLE_DEVICES is desired behavior | ||
# suppress the warning in `update_environment_variables` | ||
del os.environ[key] | ||
update_environment_variables(envs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree! and I won't block the PR on it -- but to raise vLLM quality as we have more backends, we should be careful with global state https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil
OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.
Thanks for the review!
I'm thinking of adding a
I didn't accept the change, because |
…oject#4024) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
…oject#4024) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
…oject#4024) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
…oject#4024) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
…oject#4024) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
Currently,
set_cuda_visible_devices
is one step of worker initialization. It is too specific, and usage is very narrow.Meanwhile, some information during initialization is essentially lost after initialization, e.g.
local_rank
. We cannot retrievelocal_rank
information fromparallel_state
after initialization. (We can retrieve it fromworker
, but it is difficult to get the worker instance inside many files)This PR first replaces narrow-usage
set_cuda_visible_devices
to generalupdate_environment_variables
. I would also like to passrank
/local_rank
etc. information through environment variables, so that we can easily retrieve the information anywhere inside the code, without passing the information all the way from function to function. However, this change is larger, and I would like to hear more opinions.