Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024

Merged
merged 40 commits into from
Apr 17, 2024

Conversation

youkaichao
Copy link
Member

@youkaichao youkaichao commented Apr 12, 2024

Currently, set_cuda_visible_devices is one step of worker initialization. It is too specific, and usage is very narrow.

Meanwhile, some information during initialization is essentially lost after initialization, e.g. local_rank . We cannot retrieve local_rank information from parallel_state after initialization. (We can retrieve it from worker, but it is difficult to get the worker instance inside many files)

This PR first replaces narrow-usage set_cuda_visible_devices to general update_environment_variables. I would also like to pass rank/local_rank etc. information through environment variables, so that we can easily retrieve the information anywhere inside the code, without passing the information all the way from function to function. However, this change is larger, and I would like to hear more opinions.

@youkaichao youkaichao requested a review from zhuohan123 April 12, 2024 03:48
vllm/utils.py Outdated Show resolved Hide resolved
tests/distributed/test_pynccl.py Show resolved Hide resolved
@rkooo567 rkooo567 assigned rkooo567 and unassigned rkooo567 Apr 12, 2024
@esmeetu
Copy link
Collaborator

esmeetu commented Apr 12, 2024

Can we fix #4029 by supporting choose devices in LLM. Users can pass devices='0,1' to use that two devices.

@njhill
Copy link
Member

njhill commented Apr 12, 2024

@youkaichao @zhuohan123 any chance we could get #3466 merged first? I have been rebasing it for a couple of months now.

Looks like it needs another small rebase, I can do that shortly. now rebased again

vllm/executor/ray_gpu_executor.py Outdated Show resolved Hide resolved
Comment on lines +109 to +115
def update_environment_variables(self, envs: Dict[str, str]) -> None:
key = 'CUDA_VISIBLE_DEVICES'
if key in envs and key in os.environ:
# overwriting CUDA_VISIBLE_DEVICES is desired behavior
# suppress the warning in `update_environment_variables`
del os.environ[key]
update_environment_variables(envs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high-level, we should discourage passing configuration / data via environment variables because it allows for a lot of complexity that isn't subject to documentation / visibility / common abstractions that a config object would be subject to. Basically, we don't want to give the ability for external contributors to add additional configuration space very easily. An example of this is the Ray DAG vLLM integration, which is very hidden because it's not in a config object.

# If the env var is set, it uses the Ray's compiled DAG API
# which optimizes the control plane overhead.
# Run vLLM with VLLM_USE_RAY_COMPILED_DAG=1 to enable it.
USE_RAY_COMPILED_DAG = bool(os.getenv("VLLM_USE_RAY_COMPILED_DAG", 0))

Another example is that CUDA_VISIBLE_DEVICES semantics may only work for a subset of backends; the way each backend configures which devices to use can be different. Then we have a cambrian explosion of env vars to manage devices..

Alternatives:

  • Allow a whitelist of env vars that we're confident apply for all vLLM backends, e.g. MPI's RANK/LOCAL_RANK/WORLD_SIZE/etc
  • Take standard vLLM config and allow each backend to interpret / apply it (e.g. set_visible_devices API instead of update_environment_variables)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly use environment variable, because it is not coupled with vllm internal data structure. see #3904 for example. It is quite common in distributed setting to set these common environment variables and global data structure. They are meat to be global. It is quite a pain if we want to pass then from function arguments, because each modification will need to pass them from the very top function to the very bottom function, and is unnecessarily tedious. See https://learningsystems.slack.com/archives/C05AGDSRXU5/p1712893317174879?thread_ts=1712890806.922209&cid=C05AGDSRXU5 .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! and I won't block the PR on it -- but to raise vLLM quality as we have more backends, we should be careful with global state https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil

OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.

Comment on lines +133 to +140
# if the driver worker also execute methods,
# exceptions in the rest worker may cause deadlock in rpc like ray
# see https://github.com/vllm-project/vllm/issues/3455
# print the error and inform the user to solve the error
msg = (f"Error executing method {method}. "
"This might cause deadlock in distributed execution.")
logger.exception(msg)
raise e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we have another example of this happening, this should live in the Ray-specific worker wrapper

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in general the case for rpc framework, where we have the driver worker both running models and check health/get return result from other workers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense!

vllm/utils.py Outdated
for k, v in envs.items():
if k in os.environ:
logger.warning(f"Overwriting environment variable {k} "
f"from {os.environ[k]} to {v}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do {os.environ[k]=} and {v=} so whitespace is obvious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add single quotes to make it clear. {os.environ[k]=} will print os.environ[k] to users, which is confusing and not informative.

@@ -81,3 +87,54 @@ def remove_lora(self, lora_id: int) -> bool:

def list_loras(self) -> List[int]:
raise ValueError(f"{type(self)} does not support LoRA")


class WorkerWrapperBase:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should strive for composition over inheritance to limit the coupling of the implementations to the interface; otherwise we invite undesired coupling which makes some changes difficult.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Can you elaborate on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking goes like this:

  1. The purpose of an interface is to minimally couple different implementations.
  2. Different implementations may have common logic, and this common logic can be put in the abstract class
  3. It is good to avoid this because it violates (1); the coupling goes beyond the minimal interface definition
  4. The alternative to inheritance is to have utility classes / functions and compose their usage within the different implementations. example

It is not a hard rule as it's impossible to completely decouple different components but a good design principle.

I'll still approve this PR without it; however the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier.

del os.environ[key]
update_environment_variables(envs)

def init_worker(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some docstrings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 21be004.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment, docstrings look like this (this degree of formality is overkill for this method..)

Python doc tooling will then take the first statement (if it's a string) and expose it for auto doc generation

example:

def fork(self, last_block: Block) -> List[Block]:
"""Creates a new sequence of blocks that shares the same underlying
memory as the original sequence.
Args:
last_block (Block): The last block in the original sequence.
Returns:
List[Block]: A new list of blocks that shares the same memory as the
original sequence.
"""
allocator = self._block_ids_to_allocator[last_block.block_id]
return allocator.fork(last_block)

@youkaichao youkaichao requested a review from cadedaniel April 16, 2024 22:21
Copy link
Collaborator

@cadedaniel cadedaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits and suggestions, otherwise LGTM

Comment on lines +133 to +140
# if the driver worker also execute methods,
# exceptions in the rest worker may cause deadlock in rpc like ray
# see https://github.com/vllm-project/vllm/issues/3455
# print the error and inform the user to solve the error
msg = (f"Error executing method {method}. "
"This might cause deadlock in distributed execution.")
logger.exception(msg)
raise e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense!

del os.environ[key]
update_environment_variables(envs)

def init_worker(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment, docstrings look like this (this degree of formality is overkill for this method..)

Python doc tooling will then take the first statement (if it's a string) and expose it for auto doc generation

example:

def fork(self, last_block: Block) -> List[Block]:
"""Creates a new sequence of blocks that shares the same underlying
memory as the original sequence.
Args:
last_block (Block): The last block in the original sequence.
Returns:
List[Block]: A new list of blocks that shares the same memory as the
original sequence.
"""
allocator = self._block_ids_to_allocator[last_block.block_id]
return allocator.fork(last_block)

@@ -81,3 +87,54 @@ def remove_lora(self, lora_id: int) -> bool:

def list_loras(self) -> List[int]:
raise ValueError(f"{type(self)} does not support LoRA")


class WorkerWrapperBase:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking goes like this:

  1. The purpose of an interface is to minimally couple different implementations.
  2. Different implementations may have common logic, and this common logic can be put in the abstract class
  3. It is good to avoid this because it violates (1); the coupling goes beyond the minimal interface definition
  4. The alternative to inheritance is to have utility classes / functions and compose their usage within the different implementations. example

It is not a hard rule as it's impossible to completely decouple different components but a good design principle.

I'll still approve this PR without it; however the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier.

Comment on lines +109 to +115
def update_environment_variables(self, envs: Dict[str, str]) -> None:
key = 'CUDA_VISIBLE_DEVICES'
if key in envs and key in os.environ:
# overwriting CUDA_VISIBLE_DEVICES is desired behavior
# suppress the warning in `update_environment_variables`
del os.environ[key]
update_environment_variables(envs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! and I won't block the PR on it -- but to raise vLLM quality as we have more backends, we should be careful with global state https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil

OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.

vllm/worker/cpu_worker.py Outdated Show resolved Hide resolved
@youkaichao
Copy link
Member Author

Thanks for the review!

OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.

I'm thinking of adding a vllm/flags.py to hold all the global flags in a central place. Will do that later.

the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier

I didn't accept the change, because self.worker_module_name and self.worker_class_name are how WorkerWrapperBase works. It would sound very strange to put this code into utils.

@youkaichao youkaichao enabled auto-merge (squash) April 17, 2024 06:37
@youkaichao youkaichao merged commit 8438e05 into vllm-project:main Apr 17, 2024
46 checks passed
@youkaichao youkaichao deleted the update_env branch April 17, 2024 15:00
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024
…oject#4024)

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
…oject#4024)

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024
…oject#4024)

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024
…oject#4024)

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
…oject#4024)

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants