Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitsandbytes-rocm rocm-5.6.0 support #681

Open
ccbadd opened this issue Aug 4, 2023 · 9 comments
Open

bitsandbytes-rocm rocm-5.6.0 support #681

ccbadd opened this issue Aug 4, 2023 · 9 comments

Comments

@ccbadd
Copy link

ccbadd commented Aug 4, 2023

Any chance to get the rocm branch updated to support rocm-5.6.0?

@arlo-phoenix
Copy link

If you use the official docker images/have hipBlasLt installed, you could try my fork here. Also probably not the best place to ask this, there are several user ROCm ports, but none are made by the maintainer of this project.

@hameerabbasi
Copy link

@arlo-phoenix Could you make a PR? I'd love to have this working/maintained in this repo. @TimDettmers is there a reason not to accept this?

@arlo-phoenix
Copy link

arlo-phoenix commented Aug 8, 2023

@arlo-phoenix Could you make a PR? I'd love to have this working/maintained in this repo. @TimDettmers is there a reason not to accept this?

I'd love that as well. ROCm currently makes that quite hard though. First off is a workaround to make it use the correct waveSize so that 4Bit works, second is, that HIP doesn't fully offer the same level of support in libraries as CUDA. hipBlaslt doesn't support all needed instructions and is only for some old arch-version, so I use another workaround to at least be able to include the header. I wrote a bit more details in #47. The hipBlaslt doesn't matter that much, cause we can just document what's not supported, the hacky way of making it work can't really be merged though imo.

I might make a draft though, I think this is the easiest to maintain way of including ROCm in the project. The 'just making the CUDA code work on HIP with ifdef's' approach seems to be popular as well, e.g. the ROCM-Port of llama.cpp where I took the idea from uses it.

@nonetrix
Copy link

nonetrix commented Aug 30, 2023

This being upstreamed would solve 99% of AI issues on ROCm, hope to see progress on this one day. Most up to date fork I've found, I'll try run some LLMs with it hopefully and maybe even try sd-scripts and post results I guess. Is the hack used something that could be eventually overcome with enough work? Would it need changes to the GPUs themselves, or ROCm, or is it just more difficult? I'm sorry if I'm misunderstanding things, I'm way out of my league here lol

@nonetrix
Copy link

If it can't be easily overcome, maybe just disable 4Bit support and throw a exception when it is used on ROCm. Then someone can just maintain a fork that uses the 4Bit hack, I think this would be vastly better than the current situation

@arlo-phoenix
Copy link

If it can't be easily overcome, maybe just disable 4Bit support and throw a exception when it is used on ROCm. Then someone can just maintain a fork that uses the 4Bit hack, I think this would be vastly better than the current situation

I opened a draft PR for now. I just don't know yet since when GPU's support the Wavefront Size 32 and if the hack causes other issues. Disabling 4bit isn't really an option to me as I made that port specifically for it to test out Microsoft Guidance among other llama stuff. There isn't really a big reason for it as well if the redefine doesn't cause issues elsewhere. I just don't have that much compute/ROCm experience myself. Since I previously just ran into a simple failing static_assert and now it works I assume it shouldn't cause major issues, but I'd rather have it work without it.

This being upstreamed would solve 99% of AI issues on ROCm, hope to see progress on this one day. Most up to date fork I've found, I'll try run some LLMs with it hopefully and maybe even try sd-scripts and post results I guess. Is the hack used something that could be eventually overcome with enough work? Would it need changes to the GPUs themselves, or ROCm, or is it just more difficult? I'm sorry if I'm misunderstanding things, I'm way out of my league here lol

Maybe it will be fixed by 5.7, from the stuff I read I don't even know where the issue directly is. The one issue I found said it was in another repo and can't easily be avoided with a workaround as they use precompiled headers. But it's a ROCm issue, the hardware is definitely ready since RDNA 1 from what I've read, which is long enough for most people running AI stuff. If you have any results on perfomance / support just comment on the PR from now on. I expect it to be open quite a while to see if it causes any issues / anything is missing so any tests will be helpful!

@BramVanroy
Copy link
Contributor

Any plans to improve ROCm support?

@younesbelkada
Copy link
Collaborator

Hi @BramVanroy
Yes this is still the plan ! please see: #990 for more details

@BramVanroy
Copy link
Contributor

@younesbelkada Good to hear, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants