[RFC]: Make device agnostic for diverse hardware support #9268

wangshuai09 · 2024-10-11T02:37:19Z

Motivation.

vLLM has already been adapted to many hardware devices, such as GPU, TPU, and XPU. However, adapting these backends requires implementing separate Worker/Executor/Model Runner frameworks for each, which leads to code redundancy and maintenance difficulties.
In fact, these hardware framework codes can be abstracted at the device layer, forming a unified framework. This way, only one set of code would need to be maintained, and different backends would only need to implement the device layer interfaces and any device-specific logic if necessary.
I also found some new features are only updated on GPU-related codes. In fact, these codes are also applicable to other hardware, but it is difficult for other hardware to perceive and follow these updates.

Proposed Change.

This RFC is intended to establish a unified framework.
Maybe there will be diffuculty for intergrating hardware framework to common framework, It makes sense to work towards this direction, the diagram below represents a proposed solution:

Taking Executor as example, for third-party hardware devices based on the pytorch ecosystem, the basic interfaces of torch have been well adapted, so after abstracting the device-related hard coding, such as torch.cuda, torch.xpu, GPU Executor could be used as the Common Executor of all third-party devices.

Following #6080, different hardware backends can put their own device-specific code in NewBackendPlatform, so that the framework can be device-agnostic through current_platform. For example, torch.cuda.synchronize could use current_platform.synchronize.

Feedback Period.

To realize this idea will involve more files, so the following steps are currently sorted out to finally achieve the above purpose：

BackendPlatform
- Neuron
- Openvino
Backend Type Check
- is_cpu -> current_platform.is_cpu
- is_xpu -> current_platform.is_xpu
- is_openvino -> current_platform.is_openvino
- is_neuron -> current_platform.is_neuron
- is_hip -> current_platform.is_rocm
Backend Releated Func
- seed_everything -> current_platform.seed_everything
- is_pin_memory_available -> current_platform.is_pin_memory_available
- DeviceMemoryProfiler -> current_platform.memory_profiler
- wrap_device -> current_platform.wrap_device
Backend Releated Hard Coding
- torch.xxx.get_device_name -> current_platform.get_device
- torch.xxx.Event -> current_platform.Event
- torch.xxx.synchronize -> current_platform.synchronize
- torch.xxx.Stream -> current_platform.Stream
- torch.xxx.stream -> current_platform.stream
- torch.xxx.empty_cache -> current_platform.empty_cache
- torch.xxx.device_count -> current_platform.device_count
- torch.xxx.memory_allocated -> current_platform.memroy_allocated
- torch.xxx.set_device -> current_paltform.set_device
- torch.xxx.current_device -> current_platform.current_device
- torch.xxx.get_device_capability -> current_platform.get_device_capability
Try to unify hardware framework, cpu releated framework may have problem to intergrate.
- gpu(neuron,openvino,tpu,xpu,..)_executor -> common_backend_executor
- gpu(neuron,openvino,tpu,xpu,..)_worker -> common_backend_worker
- gpu(neuron,openvino,tpu,xpu,..)_model_runner -> common_backend_model_runner

There must be omissions or difficulties in actual implementation here, keep updating.

CC List.

@youkaichao @WoosukKwon

Any Other Things.

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

youkaichao · 2024-10-11T06:30:05Z

we can do it step by step.

is_cpu -> current_platform.is_cpu
is_xpu -> current_platform.is_xpu
is_openvino -> current_platform.is_openvino
is_neuron -> current_platform.is_neuron

this can be the first step, and should be easy to do.

the rest might need some case-by-case discussion.

MengqingCao · 2024-10-26T07:24:14Z

JFYI, refactoring of neuron backend checking is done by #9374

youkaichao · 2024-10-27T23:31:19Z

Although I think this device-agnostic framework is wonderful, it is actually quite challenging to find a balance between high-level abstraction and low-level performance.

can you elaborate on that?

NickLucche · 2024-10-28T16:13:03Z

Hey, I think the idea is very interesting and the problem surely must've been tackled many times across many projects.
Personally I think the unified interface that needs to be provided here is a bit too granular still, ie worker needs to call too much into accelerator-specific functions to carry out its logic.

Bringing one example to the table, ort https://onnxruntime.ai/docs/execution-providers/ has the concept of "ExecutionProvider", but the interface it's simple enough as to group common operations into higher level framework-specific abstractions so you don't have to implement dozens of functions. TFLite had delegates but I think the example isn't as good.

Some pain points off the top of my head: execution on cpu will likely implement a small subset of all the ops , executor/worker/interface logic has to have good defaults. Calling into accelerator closed source lib may not implement all the functions (ie not applicable here but still, CoreML), same point.

wangshuai09 · 2024-10-29T06:12:21Z

Each backend in ort will implement its own ExecutionProvider which based IExecutionProvider, i think it is similiar with xxxPlatform and Platform in vllm. For cpu, it is also easy to remove or reimplement some ops based Platform.

youkaichao · 2024-10-29T22:43:01Z

I think this part is fine:

for the rest, we should have more discussion before we take actions:

wangshuai09 · 2024-10-30T01:40:27Z

Of course, a full discussion is necessary.
Thanks for your diccusion and help, we have finished the first step of Backend Type Check and are ready to work for Backend Releated Func.
This second step wants to remove backend releated func which need if...else.. to process for different hardwares. The Platform will provide interface with the same func name, and each xxxPlatform can implement its own.
Can you give me some advice for the second step? Thanks.

MengqingCao · 2024-11-07T01:19:47Z

Of course, a full discussion is necessary. Thanks for your diccusion and help, we have finished the first step of Backend Type Check and are ready to work for Backend Releated Func. This second step wants to remove backend releated func which need if...else.. to process for different hardwares. The Platform will provide interface with the same func name, and each xxxPlatform can implement its own. Can you give me some advice for the second step? Thanks.

@youkaichao could you give some advice on this?

youkaichao · 2024-11-07T02:13:05Z

you can try to find some code like this, very long if-else branching logic based on current_platform . they can be unified by current_platform.get_default_atten_backend or something like that.

youkaichao · 2024-11-07T02:36:51Z

we should start by sorting the number of if-else branches.

if there are more than 3 branches, it means at least 3 backends support this feature, and we can move it inside platforms .

if not, we can just keep them right now.

MengqingCao · 2024-11-08T01:06:06Z

we should start by sorting the number of if-else branches.

if there are more than 3 branches, it means at least 3 backends support this feature, and we can move it inside platforms .

if not, we can just keep them right now.

Thanks! I got what you mean, I'll do this work step by step.

MengqingCao · 2024-11-15T09:45:31Z

@youkaichao I list the remaining methods involving multiple backend branches, and will implement them one by one in the following PRs. If you have any suggestions, please let me know.

code path	func	func-refactor	related backends	other info
`vllm/config.py`	`ModelConfig._verify_quantization`	`current_platform`	rocm/tpu/neuron
`vllm/config.py`	`DeviceConfig.__init__`	`current_platform.device_config_init`	cuda_like/neuron/hpu/openvino/tpu/cpu/xpu
`vllm/utils.py`	`is_pin_memory_available`	`current_platform.is_pin_memory_available`	xpu/neuron/hpu/cpu/openvino	TODO: How to deal with `in_wsl`? Just leave it here?
`vllm/model_executor/custom_op.py`	`CustomOp.dispatch_forward`	`current_platform.custom_forward`	when enabled, rocm/cpu/hpu/tpu/xpu/cuda-for-default

youkaichao · 2024-11-15T17:19:48Z

please don't directly change DeviceConfig.__init__, but have current_platform.device_type to be a string, and call current_platform.device_type in DeviceConfig.__init__ .

let's do it step by step, others need further discussion.

youkaichao · 2024-11-17T07:16:49Z

I added #10402 as a first step to absorb some config checking and updating code into platforms/ . @wangshuai09 @MengqingCao if you are interested, welcome to do the same thing for xpu executor/open vino executor, etc.

MengqingCao · 2024-11-18T02:12:24Z

I added #10402 as a first step to absorb some config checking and updating code into platforms/ . @wangshuai09 @MengqingCao if you are interested, welcome to do the same thing for xpu executor/open vino executor, etc.

Sure, I'll strat this work at xpu exectutor :-)

wangshuai09 added the RFC label Oct 11, 2024

youkaichao self-assigned this Oct 11, 2024

wangshuai09 mentioned this issue Oct 21, 2024

[Hardware][CPU] using current_platform.is_cpu #9536

Merged

MengqingCao mentioned this issue Oct 23, 2024

[Hardware][XPU] using current_platform.is_xpu #9605

Merged

wangshuai09 mentioned this issue Oct 24, 2024

[Hardware][ROCM] using current_platform.is_rocm #9642

Merged

MengqingCao mentioned this issue Oct 26, 2024

[Hardware][openvino] is_openvino --> current_platform.is_openvino #9716

Merged

wangshuai09 mentioned this issue Oct 29, 2024

[Hardware] using current_platform.seed_everything #9785

Merged

MengqingCao mentioned this issue Nov 15, 2024

[Platform][Refactor] Extract func get_default_attn_backend to Platform #10358

Merged

youkaichao mentioned this issue Nov 17, 2024

[platforms] refactor cpu code #10402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Make device agnostic for diverse hardware support #9268

[RFC]: Make device agnostic for diverse hardware support #9268

wangshuai09 commented Oct 11, 2024 •

edited

Loading

youkaichao commented Oct 11, 2024

MengqingCao commented Oct 26, 2024

youkaichao commented Oct 27, 2024

NickLucche commented Oct 28, 2024

wangshuai09 commented Oct 29, 2024

youkaichao commented Oct 29, 2024

wangshuai09 commented Oct 30, 2024 •

edited

Loading

MengqingCao commented Nov 7, 2024

youkaichao commented Nov 7, 2024

youkaichao commented Nov 7, 2024

MengqingCao commented Nov 8, 2024

MengqingCao commented Nov 15, 2024 •

edited

Loading

youkaichao commented Nov 15, 2024

youkaichao commented Nov 17, 2024

MengqingCao commented Nov 18, 2024

[RFC]: Make device agnostic for diverse hardware support #9268

[RFC]: Make device agnostic for diverse hardware support #9268

Comments

wangshuai09 commented Oct 11, 2024 • edited Loading

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

youkaichao commented Oct 11, 2024

MengqingCao commented Oct 26, 2024

youkaichao commented Oct 27, 2024

NickLucche commented Oct 28, 2024

wangshuai09 commented Oct 29, 2024

youkaichao commented Oct 29, 2024

I think this part is fine:

for the rest, we should have more discussion before we take actions:

wangshuai09 commented Oct 30, 2024 • edited Loading

MengqingCao commented Nov 7, 2024

youkaichao commented Nov 7, 2024

youkaichao commented Nov 7, 2024

MengqingCao commented Nov 8, 2024

MengqingCao commented Nov 15, 2024 • edited Loading

youkaichao commented Nov 15, 2024

youkaichao commented Nov 17, 2024

MengqingCao commented Nov 18, 2024

wangshuai09 commented Oct 11, 2024 •

edited

Loading

wangshuai09 commented Oct 30, 2024 •

edited

Loading

MengqingCao commented Nov 15, 2024 •

edited

Loading