Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix using CuPy for eager mode #3037

Merged
merged 1 commit into from
Feb 27, 2024
Merged

Conversation

esmeetu
Copy link
Collaborator

@esmeetu esmeetu commented Feb 26, 2024

Currently vLLM inference on multi-node is broken after #2811 introduced cupy.
This PR support multi-node inference for eager mode.
Likely temporarily fix #2826 #2959 with eager mode.

@esmeetu esmeetu requested a review from WoosukKwon February 26, 2024 13:58
@WoosukKwon
Copy link
Collaborator

I think the right way to do this is to correctly set up CuPy in multi-node setting? WDYT?

@Yard1
Copy link
Collaborator

Yard1 commented Feb 26, 2024

Considering CuPy is supposed to just be a workaround for now, I think it makes more sense to avoid using it unless necessary (and it should not be necessary in eager mode).

That being said, it does have to work in multi-node setting to enable CUDA graphs there. It should be relatively straightforward to set it up.

@esmeetu
Copy link
Collaborator Author

esmeetu commented Feb 27, 2024

I think the right way to do this is to correctly set up CuPy in multi-node setting? WDYT?

@WoosukKwon I understand in #2811 that you just fix memory leak issue on cuda graph by introducing CuPy. And eager mode doesn't have that issue, so we can keep it as it is before we found some benefit.

@WoosukKwon
Copy link
Collaborator

@esmeetu Yep. Makes sense! Let's merge this PR and figure out how to correctly set up CuPy in the multi-node setting.

@WoosukKwon WoosukKwon merged commit c1c0d00 into vllm-project:main Feb 27, 2024
20 of 22 checks passed
@esmeetu esmeetu deleted the fix-eager branch March 1, 2024 14:00
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vLLM running on a Ray Cluster Hanging on Initializing
3 participants