-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: multi-lora support older nvidia gpus. #6123
Comments
I noticed that when building the vendored punica the issues were all related to bf16 arithmetic operations not being defined in cuda 12.1. Building against a newer cuda version (12.4) which has headers that define these operations fixed the problems. Note that I'm not sure if building the kernels against cuda 12.4 is desirable/a good engineering practice if we want to support cuda 12.1 still. If that's the case, we can probably vendor the relevant code from cuda (though i don't have a sense of how complicated this would be). |
#5036 are working on addressing the issue you mentioned |
What shall I do? Using V100 keeps reporting errors |
Are you testing #5036? |
这是来自QQ邮箱的假期自动回复邮件。
来信我已收到,我会尽快给你回复,谢谢!!!
|
The vllm installed using the pr5036 source code was OK before, but 0.5.4 reported an error |
这是来自QQ邮箱的假期自动回复邮件。
来信我已收到,我会尽快给你回复,谢谢!!!
|
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
这是来自QQ邮箱的假期自动回复邮件。
来信我已收到,我会尽快给你回复,谢谢!!!
|
这是来自QQ邮箱的假期自动回复邮件。
来信我已收到,我会尽快给你回复,谢谢!!!
|
🚀 The feature, motivation and pitch
Currently vLLM only supports LoRA adapters on nvidia gpus with compute capability >= 8.0. This request is to support >= 7.5.
The limitation here is that vLLM relies on https://github.com/punica-ai/punica for efficient LoRA and the upstream doesn't support older gpus.
Personally I've mainly run into this problem on Kaggle which requires you to run on T4s or older. Others seem to have run into this problem in other environments. Collab: #5199, other V100s #3826
Alternatives
In some but not all cases this can be mitigated by using a newer gpu or applying the lora to the base model and model swapping.
Additional context
I'm willing to contribute this. I've prototyped this and verified that it's possible to do this efficiently by changing the step of vllm's wheel build which builds the vendored punica kernel.
The text was updated successfully, but these errors were encountered: