-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel error for running example.py #1
Comments
Hi, may I ask which GPU you are using? We currently support sm_86 (Ampere, RTX3090/A6000) and sm_89 (Ada, RTX4090). The kernel may run on sm_80 (A100) but expect a significant performance drop. If you want to try it on A100 you could edit |
Thanks for your explanation. Such a pity since I am using H100.. |
Can you add an option for xformers or SDPA? If you used the AWQ kernel that supports older cards that's all it would take. Are the weights just standard gemv or is it custom? |
may as well use a different inference engine if you're talking about using more generic kernels. |
There aren't a lot of kernels to choose from sadly. Custom kernels seem like the way to go for these transformer based models just like on the LLM side. Unfortunately everyone is using ampere only as the baseline. AWQ does have kernels working for previous implementations and there are other attention mechanisms. |
Hi, thanks for this amazing work! When running the given example, I got the following error:
RuntimeError: CUDA error: no kernel image is available for execution on the device (at /data/nunchaku/src/kernels/awq/gemv_awq.cu:312)
. I have followed the instructions for installation using torch 2.4.1.The text was updated successfully, but these errors were encountered: