[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024

youkaichao · 2024-04-12T03:42:36Z

Currently, set_cuda_visible_devices is one step of worker initialization. It is too specific, and usage is very narrow.

Meanwhile, some information during initialization is essentially lost after initialization, e.g. local_rank . We cannot retrieve local_rank information from parallel_state after initialization. (We can retrieve it from worker, but it is difficult to get the worker instance inside many files)

This PR first replaces narrow-usage set_cuda_visible_devices to general update_environment_variables. I would also like to pass rank/local_rank etc. information through environment variables, so that we can easily retrieve the information anywhere inside the code, without passing the information all the way from function to function. However, this change is larger, and I would like to hear more opinions.

…onment_variables

vllm/utils.py

tests/distributed/test_pynccl.py

…iables to base worker

esmeetu · 2024-04-12T13:14:41Z

Can we fix #4029 by supporting choose devices in LLM. Users can pass devices='0,1' to use that two devices.

njhill · 2024-04-12T18:28:44Z

@youkaichao @zhuohan123 any chance we could get #3466 merged first? I have been rebasing it for a couple of months now.

~~Looks like it needs another small rebase, I can do that shortly.~~ now rebased again

vllm/executor/ray_gpu_executor.py

cadedaniel · 2024-04-16T21:25:04Z

vllm/worker/worker_base.py

+    def update_environment_variables(self, envs: Dict[str, str]) -> None:
+        key = 'CUDA_VISIBLE_DEVICES'
+        if key in envs and key in os.environ:
+            # overwriting CUDA_VISIBLE_DEVICES is desired behavior
+            # suppress the warning in `update_environment_variables`
+            del os.environ[key]
+        update_environment_variables(envs)


At a high-level, we should discourage passing configuration / data via environment variables because it allows for a lot of complexity that isn't subject to documentation / visibility / common abstractions that a config object would be subject to. Basically, we don't want to give the ability for external contributors to add additional configuration space very easily. An example of this is the Ray DAG vLLM integration, which is very hidden because it's not in a config object.

vllm/vllm/executor/ray_gpu_executor.py

Lines 24 to 27 in e95cd87

# If the env var is set, it uses the Ray's compiled DAG API

# which optimizes the control plane overhead.

# Run vLLM with VLLM_USE_RAY_COMPILED_DAG=1 to enable it.

USE_RAY_COMPILED_DAG = bool(os.getenv("VLLM_USE_RAY_COMPILED_DAG", 0))

Another example is that CUDA_VISIBLE_DEVICES semantics may only work for a subset of backends; the way each backend configures which devices to use can be different. Then we have a cambrian explosion of env vars to manage devices..

Alternatives:

Allow a whitelist of env vars that we're confident apply for all vLLM backends, e.g. MPI's RANK/LOCAL_RANK/WORLD_SIZE/etc

Take standard vLLM config and allow each backend to interpret / apply it (e.g. set_visible_devices API instead of update_environment_variables)

I explicitly use environment variable, because it is not coupled with vllm internal data structure. see #3904 for example. It is quite common in distributed setting to set these common environment variables and global data structure. They are meat to be global. It is quite a pain if we want to pass then from function arguments, because each modification will need to pass them from the very top function to the very bottom function, and is unnecessarily tedious. See https://learningsystems.slack.com/archives/C05AGDSRXU5/p1712893317174879?thread_ts=1712890806.922209&cid=C05AGDSRXU5 .

Agree! and I won't block the PR on it -- but to raise vLLM quality as we have more backends, we should be careful with global state https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil

OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.

cadedaniel · 2024-04-16T21:26:11Z

vllm/worker/worker_base.py

+            # if the driver worker also execute methods,
+            # exceptions in the rest worker may cause deadlock in rpc like ray
+            # see https://github.com/vllm-project/vllm/issues/3455
+            # print the error and inform the user to solve the error
+            msg = (f"Error executing method {method}. "
+                   "This might cause deadlock in distributed execution.")
+            logger.exception(msg)
+            raise e


Until we have another example of this happening, this should live in the Ray-specific worker wrapper

This is in general the case for rpc framework, where we have the driver worker both running models and check health/get return result from other workers.

OK makes sense!

cadedaniel · 2024-04-16T21:27:25Z

vllm/utils.py

+    for k, v in envs.items():
+        if k in os.environ:
+            logger.warning(f"Overwriting environment variable {k} "
+                           f"from {os.environ[k]} to {v}")


nit: do {os.environ[k]=} and {v=} so whitespace is obvious

I add single quotes to make it clear. {os.environ[k]=} will print os.environ[k] to users, which is confusing and not informative.

cadedaniel · 2024-04-16T21:29:44Z

vllm/worker/worker_base.py

@@ -81,3 +87,54 @@ def remove_lora(self, lora_id: int) -> bool:

    def list_loras(self) -> List[int]:
        raise ValueError(f"{type(self)} does not support LoRA")
+
+
+class WorkerWrapperBase:


We should strive for composition over inheritance to limit the coupling of the implementations to the interface; otherwise we invite undesired coupling which makes some changes difficult.

I don't get it. Can you elaborate on this?

The thinking goes like this:

The purpose of an interface is to minimally couple different implementations.

Different implementations may have common logic, and this common logic can be put in the abstract class

It is good to avoid this because it violates (1); the coupling goes beyond the minimal interface definition

The alternative to inheritance is to have utility classes / functions and compose their usage within the different implementations. example

It is not a hard rule as it's impossible to completely decouple different components but a good design principle.

I'll still approve this PR without it; however the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier.

cadedaniel · 2024-04-16T21:30:04Z

vllm/worker/worker_base.py

+            del os.environ[key]
+        update_environment_variables(envs)
+
+    def init_worker(self, *args, **kwargs):


Can we add some docstrings?

fixed in 21be004.

This is a comment, docstrings look like this (this degree of formality is overkill for this method..)

Python doc tooling will then take the first statement (if it's a string) and expose it for auto doc generation

example:

vllm/vllm/core/block/cpu_gpu_block_allocator.py

Lines 155 to 167 in d150e4f

def fork(self, last_block: Block) -> List[Block]:

"""Creates a new sequence of blocks that shares the same underlying

memory as the original sequence.

Args:

last_block (Block): The last block in the original sequence.

Returns:

List[Block]: A new list of blocks that shares the same memory as the

original sequence.

"""

allocator = self._block_ids_to_allocator[last_block.block_id]

return allocator.fork(last_block)

cadedaniel

some nits and suggestions, otherwise LGTM

cadedaniel · 2024-04-17T05:28:06Z

vllm/worker/worker_base.py

+            # if the driver worker also execute methods,
+            # exceptions in the rest worker may cause deadlock in rpc like ray
+            # see https://github.com/vllm-project/vllm/issues/3455
+            # print the error and inform the user to solve the error
+            msg = (f"Error executing method {method}. "
+                   "This might cause deadlock in distributed execution.")
+            logger.exception(msg)
+            raise e


OK makes sense!

cadedaniel · 2024-04-17T05:32:48Z

vllm/worker/worker_base.py

+            del os.environ[key]
+        update_environment_variables(envs)
+
+    def init_worker(self, *args, **kwargs):


This is a comment, docstrings look like this (this degree of formality is overkill for this method..)

Python doc tooling will then take the first statement (if it's a string) and expose it for auto doc generation

example:

vllm/vllm/core/block/cpu_gpu_block_allocator.py

Lines 155 to 167 in d150e4f

def fork(self, last_block: Block) -> List[Block]:

"""Creates a new sequence of blocks that shares the same underlying

memory as the original sequence.

Args:

last_block (Block): The last block in the original sequence.

Returns:

List[Block]: A new list of blocks that shares the same memory as the

original sequence.

"""

allocator = self._block_ids_to_allocator[last_block.block_id]

return allocator.fork(last_block)

cadedaniel · 2024-04-17T05:39:09Z

vllm/worker/worker_base.py

@@ -81,3 +87,54 @@ def remove_lora(self, lora_id: int) -> bool:

    def list_loras(self) -> List[int]:
        raise ValueError(f"{type(self)} does not support LoRA")
+
+
+class WorkerWrapperBase:


The thinking goes like this:

The purpose of an interface is to minimally couple different implementations.

Different implementations may have common logic, and this common logic can be put in the abstract class

It is good to avoid this because it violates (1); the coupling goes beyond the minimal interface definition

The alternative to inheritance is to have utility classes / functions and compose their usage within the different implementations. example

It is not a hard rule as it's impossible to completely decouple different components but a good design principle.

I'll still approve this PR without it; however the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier.

cadedaniel · 2024-04-17T05:48:58Z

vllm/worker/worker_base.py

+    def update_environment_variables(self, envs: Dict[str, str]) -> None:
+        key = 'CUDA_VISIBLE_DEVICES'
+        if key in envs and key in os.environ:
+            # overwriting CUDA_VISIBLE_DEVICES is desired behavior
+            # suppress the warning in `update_environment_variables`
+            del os.environ[key]
+        update_environment_variables(envs)


Agree! and I won't block the PR on it -- but to raise vLLM quality as we have more backends, we should be careful with global state https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil

OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.

vllm/worker/cpu_worker.py

youkaichao · 2024-04-17T06:30:45Z

Thanks for the review!

OK to have a small set of allowed env vars, but it's a bad idea to support arbitrary ones at interface level. fine to improve later.

I'm thinking of adding a vllm/flags.py to hold all the global flags in a central place. Will do that later.

the benefit is that each worker implementation is more cleanly separated from each other, making future changes easier

I didn't accept the change, because self.worker_module_name and self.worker_class_name are how WorkerWrapperBase works. It would sound very strange to put this code into utils.

…oject#4024) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (vllm-project#4024)

youkaichao added 2 commits April 11, 2024 20:35

replace narrow-usage set_cuda_visible_devices to general update_envir…

f7a6356

…onment_variables

add warning when env is overwritten

bbdfc69

youkaichao requested a review from zhuohan123 April 12, 2024 03:48

zhuohan123 reviewed Apr 12, 2024

View reviewed changes

vllm/utils.py Outdated Show resolved Hide resolved

tests/distributed/test_pynccl.py Show resolved Hide resolved

youkaichao added 5 commits April 11, 2024 21:49

use logger.warning

1e62614

fix env copy

37eb344

avoid overwritten warning in ray

6f64b48

fix lint

0499106

allow heterogeneous args in _run_workers; move update_environment_var…

d26672f

…iables to base worker

rkooo567 assigned rkooo567 and unassigned rkooo567 Apr 12, 2024

youkaichao added 9 commits April 12, 2024 00:02

unified init worker

3a01337

fix recursion

c85d040

on the fly local rank calculation

5e49b98

post update kwargs

37ed6c9

add remote

b654ee2

fix update_environment_variables in ray worker

e11448e

use staticmethod

97e6601

fix dummy worker local_rank

fd2cbe2

fix dummy worker rank

a8d7504

youkaichao added 9 commits April 12, 2024 10:01

add WorkerWrapperBase

e659635

add all_args to _run_workers

778fb3f

refactor

d295107

fix dangling self

7ca22a4

fix execute_method in driver worker

5f6c8f3

withdraw changes in many workers

13de66e

no need for init_worker in workerbase

32ef3bb

unify worker_node_and_gpu_ids

221f626

use id rather than ip

0087773

cadedaniel reviewed Apr 16, 2024

View reviewed changes

youkaichao added 5 commits April 16, 2024 14:43

Merge remote-tracking branch 'origin' into update_env

a164219

fix mypy typing

eb27be9

move init hf decision to each worker

74deb44

use quotes to address white space in env var values

3bd2c98

add docstring

21be004

youkaichao requested a review from cadedaniel April 16, 2024 22:21

youkaichao added 3 commits April 16, 2024 16:40

add config

1aee6a0

Merge remote-tracking branch 'origin' into update_env

4337ac6

fix _run_workers_async

40d4560

cadedaniel approved these changes Apr 17, 2024

View reviewed changes

youkaichao added 2 commits April 16, 2024 23:26

move duplicate code to utils

2509db4

add docstring

d1bda36

use docstring

1e30d89

youkaichao enabled auto-merge (squash) April 17, 2024 06:37

youkaichao merged commit 8438e05 into vllm-project:main Apr 17, 2024
46 checks passed

youkaichao deleted the update_env branch April 17, 2024 15:00

njhill mentioned this pull request Apr 19, 2024

[Core] Some simplification of WorkerWrapper changes #4183

Merged

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024

youkaichao commented Apr 12, 2024 •

edited

Loading

esmeetu commented Apr 12, 2024 •

edited

Loading

njhill commented Apr 12, 2024 •

edited

Loading

cadedaniel Apr 16, 2024

youkaichao Apr 16, 2024

cadedaniel Apr 17, 2024

cadedaniel Apr 16, 2024

youkaichao Apr 16, 2024

cadedaniel Apr 17, 2024

cadedaniel Apr 16, 2024

youkaichao Apr 16, 2024

cadedaniel Apr 16, 2024

youkaichao Apr 16, 2024

cadedaniel Apr 17, 2024

cadedaniel Apr 16, 2024

youkaichao Apr 16, 2024

cadedaniel Apr 17, 2024

cadedaniel left a comment

cadedaniel Apr 17, 2024

cadedaniel Apr 17, 2024

cadedaniel Apr 17, 2024

cadedaniel Apr 17, 2024

youkaichao commented Apr 17, 2024

	# If the env var is set, it uses the Ray's compiled DAG API
	# which optimizes the control plane overhead.
	# Run vLLM with VLLM_USE_RAY_COMPILED_DAG=1 to enable it.
	USE_RAY_COMPILED_DAG = bool(os.getenv("VLLM_USE_RAY_COMPILED_DAG", 0))

	def fork(self, last_block: Block) -> List[Block]:
	"""Creates a new sequence of blocks that shares the same underlying
	memory as the original sequence.

	Args:
	last_block (Block): The last block in the original sequence.

	Returns:
	List[Block]: A new list of blocks that shares the same memory as the
	original sequence.
	"""
	allocator = self._block_ids_to_allocator[last_block.block_id]
	return allocator.fork(last_block)

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024

[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication #4024

Conversation

youkaichao commented Apr 12, 2024 • edited Loading

esmeetu commented Apr 12, 2024 • edited Loading

njhill commented Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadedaniel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youkaichao commented Apr 17, 2024

youkaichao commented Apr 12, 2024 •

edited

Loading

esmeetu commented Apr 12, 2024 •

edited

Loading

njhill commented Apr 12, 2024 •

edited

Loading