-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bitsandbytes-rocm rocm-5.6.0 support #681
Comments
If you use the official docker images/have hipBlasLt installed, you could try my fork here. Also probably not the best place to ask this, there are several user ROCm ports, but none are made by the maintainer of this project. |
@arlo-phoenix Could you make a PR? I'd love to have this working/maintained in this repo. @TimDettmers is there a reason not to accept this? |
I'd love that as well. ROCm currently makes that quite hard though. First off is a workaround to make it use the correct waveSize so that 4Bit works, second is, that HIP doesn't fully offer the same level of support in libraries as CUDA. hipBlaslt doesn't support all needed instructions and is only for some old arch-version, so I use another workaround to at least be able to include the header. I wrote a bit more details in #47. The hipBlaslt doesn't matter that much, cause we can just document what's not supported, the hacky way of making it work can't really be merged though imo. I might make a draft though, I think this is the easiest to maintain way of including ROCm in the project. The 'just making the CUDA code work on HIP with ifdef's' approach seems to be popular as well, e.g. the ROCM-Port of llama.cpp where I took the idea from uses it. |
This being upstreamed would solve 99% of AI issues on ROCm, hope to see progress on this one day. Most up to date fork I've found, I'll try run some LLMs with it hopefully and maybe even try sd-scripts and post results I guess. Is the hack used something that could be eventually overcome with enough work? Would it need changes to the GPUs themselves, or ROCm, or is it just more difficult? I'm sorry if I'm misunderstanding things, I'm way out of my league here lol |
If it can't be easily overcome, maybe just disable 4Bit support and throw a exception when it is used on ROCm. Then someone can just maintain a fork that uses the 4Bit hack, I think this would be vastly better than the current situation |
I opened a draft PR for now. I just don't know yet since when GPU's support the Wavefront Size 32 and if the hack causes other issues. Disabling 4bit isn't really an option to me as I made that port specifically for it to test out Microsoft Guidance among other llama stuff. There isn't really a big reason for it as well if the redefine doesn't cause issues elsewhere. I just don't have that much compute/ROCm experience myself. Since I previously just ran into a simple failing static_assert and now it works I assume it shouldn't cause major issues, but I'd rather have it work without it.
Maybe it will be fixed by 5.7, from the stuff I read I don't even know where the issue directly is. The one issue I found said it was in another repo and can't easily be avoided with a workaround as they use precompiled headers. But it's a ROCm issue, the hardware is definitely ready since RDNA 1 from what I've read, which is long enough for most people running AI stuff. If you have any results on perfomance / support just comment on the PR from now on. I expect it to be open quite a while to see if it causes any issues / anything is missing so any tests will be helpful! |
Any plans to improve ROCm support? |
Hi @BramVanroy |
@younesbelkada Good to hear, thanks! |
Any chance to get the rocm branch updated to support rocm-5.6.0?
The text was updated successfully, but these errors were encountered: