Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: multi-lora support older nvidia gpus. #6123

Closed
wuisawesome opened this issue Jul 4, 2024 · 10 comments
Closed

[Feature]: multi-lora support older nvidia gpus. #6123

wuisawesome opened this issue Jul 4, 2024 · 10 comments

Comments

@wuisawesome
Copy link
Contributor

🚀 The feature, motivation and pitch

Currently vLLM only supports LoRA adapters on nvidia gpus with compute capability >= 8.0. This request is to support >= 7.5.

The limitation here is that vLLM relies on https://github.com/punica-ai/punica for efficient LoRA and the upstream doesn't support older gpus.

Personally I've mainly run into this problem on Kaggle which requires you to run on T4s or older. Others seem to have run into this problem in other environments. Collab: #5199, other V100s #3826

Alternatives

In some but not all cases this can be mitigated by using a newer gpu or applying the lora to the base model and model swapping.

Additional context

I'm willing to contribute this. I've prototyped this and verified that it's possible to do this efficiently by changing the step of vllm's wheel build which builds the vendored punica kernel.

@wuisawesome
Copy link
Contributor Author

I noticed that when building the vendored punica the issues were all related to bf16 arithmetic operations not being defined in cuda 12.1. Building against a newer cuda version (12.4) which has headers that define these operations fixed the problems.

Note that I'm not sure if building the kernels against cuda 12.4 is desirable/a good engineering practice if we want to support cuda 12.1 still. If that's the case, we can probably vendor the relevant code from cuda (though i don't have a sense of how complicated this would be).

@jeejeelee
Copy link
Collaborator

#5036 are working on addressing the issue you mentioned

@shimu007
Copy link

What shall I do? Using V100 keeps reporting errors

@jeejeelee
Copy link
Collaborator

What shall I do? Using V100 keeps reporting errors

Are you testing #5036?

@shimu007
Copy link

shimu007 commented Jul 23, 2024 via email

@Cloopen-ReLiNK
Copy link

What shall I do? Using V100 keeps reporting errors

Are you testing #5036?

The vllm installed using the pr5036 source code was OK before, but 0.5.4 reported an error
@jeejeelee
V100 + lora /tmp/tmp8uy71zq3/main.c:6:23: fatal error: stdatomic.h: No such file or directory

@shimu007
Copy link

shimu007 commented Aug 12, 2024 via email

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Nov 12, 2024
@shimu007
Copy link

shimu007 commented Nov 12, 2024 via email

@github-actions github-actions bot added unstale and removed stale labels Nov 14, 2024
@shimu007
Copy link

shimu007 commented Dec 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants