Fix using CuPy for eager mode #3037

esmeetu · 2024-02-26T13:58:09Z

Currently vLLM inference on multi-node is broken after #2811 introduced cupy.
This PR support multi-node inference for eager mode.
Likely temporarily fix #2826 #2959 with eager mode.

WoosukKwon · 2024-02-26T19:41:24Z

I think the right way to do this is to correctly set up CuPy in multi-node setting? WDYT?

Yard1 · 2024-02-26T20:57:09Z

Considering CuPy is supposed to just be a workaround for now, I think it makes more sense to avoid using it unless necessary (and it should not be necessary in eager mode).

That being said, it does have to work in multi-node setting to enable CUDA graphs there. It should be relatively straightforward to set it up.

esmeetu · 2024-02-27T01:10:58Z

I think the right way to do this is to correctly set up CuPy in multi-node setting? WDYT?

@WoosukKwon I understand in #2811 that you just fix memory leak issue on cuda graph by introducing CuPy. And eager mode doesn't have that issue, so we can keep it as it is before we found some benefit.

WoosukKwon · 2024-02-27T01:32:57Z

@esmeetu Yep. Makes sense! Let's merge this PR and figure out how to correctly set up CuPy in the multi-node setting.

don't use cupy for eager mode

a2e9754

esmeetu requested a review from WoosukKwon February 26, 2024 13:58

WoosukKwon approved these changes Feb 27, 2024

View reviewed changes

WoosukKwon merged commit c1c0d00 into vllm-project:main Feb 27, 2024
20 of 22 checks passed

zhourrr mentioned this pull request Feb 29, 2024

Load model turn out to be very slow after update the version of vllm #2959

Closed

esmeetu deleted the fix-eager branch March 1, 2024 14:00

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Don't use cupy when enforce_eager=True (vllm-project#3037)

d55c43c

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Don't use cupy when enforce_eager=True (vllm-project#3037)

7169e69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix using CuPy for eager mode #3037

Fix using CuPy for eager mode #3037

esmeetu commented Feb 26, 2024

WoosukKwon commented Feb 26, 2024

Yard1 commented Feb 26, 2024 •

edited

Loading

esmeetu commented Feb 27, 2024

WoosukKwon commented Feb 27, 2024

Fix using CuPy for eager mode #3037

Fix using CuPy for eager mode #3037

Conversation

esmeetu commented Feb 26, 2024

WoosukKwon commented Feb 26, 2024

Yard1 commented Feb 26, 2024 • edited Loading

esmeetu commented Feb 27, 2024

WoosukKwon commented Feb 27, 2024

Yard1 commented Feb 26, 2024 •

edited

Loading