-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU #5199
Comments
Have the same issue, however is running it on an Azure VM with a T4 GPU using docker |
Hi @rikitomo and @emillykkejensen, it is unfortunately the case that punica does not support T4 or V100, per #3197 Please follow up with this in the issue on their repo punica-ai/punica#44. Once it is addressed, we can pull in the updated kernels into vLLM - thanks! On another note: perhaps this will be addressed by this recent work on using Triton for LoRA inference! #5036 |
Hi @jeejeelee Thanks a lot for the proposed fix. However, when I try to build from your branch I get the same error. I'm building inside a Docker Container, so don't know if that is the issue. What I did:
Ones done building I ran:
That gave me this output:
|
@emillykkejensen It seems that the error is triggered by |
You should clone my repo using: git clone -b refactor-punica-kernel https://github.com/jeejeelee/vllm.git |
@emillykkejensen I can run awq+lora properly on TITAN RTX. FYI https://github.com/vllm-project/vllm/blob/main/csrc/quantization/awq/dequantize.cuh#L18 |
Hi again @jeejeelee Sorry for that, you are 100% right! If I do the above, but clone the correct branch (!!) it works. Thanks for the fix, and hope it will be merged into master soon :) |
So I tried to build a local docker image using your branch: ( It seems to load vLLM and also load the model okay, but when I call it I get the following error:
|
maybe you can try passing the lora path using a local absolute path. |
@jeejeelee Hi, thank you so much for your work! If I just want to run LoRA on a T4, which of your previous commit should I build from? |
You can build from the last commit. If you have any questions, please feel free to contact me. |
the same problem applying lora for
|
@naturomics Hi you can try #5036. It should be able to address your issues. |
Your current environment
I am currently using a T4 instance on Google Colaboratory.
🐛 Describe the bug
I encounter an error when attempting to apply LoRA in vllm. Here are the details of the problem:
Below is a sample code snippet that reproduces the issue:
which yields the following output:
I would appreciate it if you could explain how to apply LoRA on a T4 GPU.
This time, I am running it on phi-3, but when I tried Llama3 in a different T4×2 environment, the same error occurred.
I hope this helps you with your GitHub issue submission.
If you need any further adjustments or additional information included, please let me know!
The text was updated successfully, but these errors were encountered: