bitsandbytes-rocm rocm-5.6.0 support #681

ccbadd · 2023-08-04T16:21:06Z

Any chance to get the rocm branch updated to support rocm-5.6.0?

arlo-phoenix · 2023-08-05T00:54:20Z

If you use the official docker images/have hipBlasLt installed, you could try my fork here. Also probably not the best place to ask this, there are several user ROCm ports, but none are made by the maintainer of this project.

hameerabbasi · 2023-08-07T11:52:27Z

@arlo-phoenix Could you make a PR? I'd love to have this working/maintained in this repo. @TimDettmers is there a reason not to accept this?

arlo-phoenix · 2023-08-08T20:17:46Z

@arlo-phoenix Could you make a PR? I'd love to have this working/maintained in this repo. @TimDettmers is there a reason not to accept this?

I'd love that as well. ROCm currently makes that quite hard though. First off is a workaround to make it use the correct waveSize so that 4Bit works, second is, that HIP doesn't fully offer the same level of support in libraries as CUDA. hipBlaslt doesn't support all needed instructions and is only for some old arch-version, so I use another workaround to at least be able to include the header. I wrote a bit more details in #47. The hipBlaslt doesn't matter that much, cause we can just document what's not supported, the hacky way of making it work can't really be merged though imo.

I might make a draft though, I think this is the easiest to maintain way of including ROCm in the project. The 'just making the CUDA code work on HIP with ifdef's' approach seems to be popular as well, e.g. the ROCM-Port of llama.cpp where I took the idea from uses it.

nonetrix · 2023-08-30T10:36:17Z

This being upstreamed would solve 99% of AI issues on ROCm, hope to see progress on this one day. Most up to date fork I've found, I'll try run some LLMs with it hopefully and maybe even try sd-scripts and post results I guess. Is the hack used something that could be eventually overcome with enough work? Would it need changes to the GPUs themselves, or ROCm, or is it just more difficult? I'm sorry if I'm misunderstanding things, I'm way out of my league here lol

nonetrix · 2023-08-30T11:01:44Z

If it can't be easily overcome, maybe just disable 4Bit support and throw a exception when it is used on ROCm. Then someone can just maintain a fork that uses the 4Bit hack, I think this would be vastly better than the current situation

arlo-phoenix · 2023-09-08T13:32:35Z

If it can't be easily overcome, maybe just disable 4Bit support and throw a exception when it is used on ROCm. Then someone can just maintain a fork that uses the 4Bit hack, I think this would be vastly better than the current situation

I opened a draft PR for now. I just don't know yet since when GPU's support the Wavefront Size 32 and if the hack causes other issues. Disabling 4bit isn't really an option to me as I made that port specifically for it to test out Microsoft Guidance among other llama stuff. There isn't really a big reason for it as well if the redefine doesn't cause issues elsewhere. I just don't have that much compute/ROCm experience myself. Since I previously just ran into a simple failing static_assert and now it works I assume it shouldn't cause major issues, but I'd rather have it work without it.

This being upstreamed would solve 99% of AI issues on ROCm, hope to see progress on this one day. Most up to date fork I've found, I'll try run some LLMs with it hopefully and maybe even try sd-scripts and post results I guess. Is the hack used something that could be eventually overcome with enough work? Would it need changes to the GPUs themselves, or ROCm, or is it just more difficult? I'm sorry if I'm misunderstanding things, I'm way out of my league here lol

Maybe it will be fixed by 5.7, from the stuff I read I don't even know where the issue directly is. The one issue I found said it was in another repo and can't easily be avoided with a workaround as they use precompiled headers. But it's a ROCm issue, the hardware is definitely ready since RDNA 1 from what I've read, which is long enough for most people running AI stuff. If you have any results on perfomance / support just comment on the PR from now on. I expect it to be open quite a while to see if it causes any issues / anything is missing so any tests will be helpful!

BramVanroy · 2024-03-06T10:08:59Z

Any plans to improve ROCm support?

younesbelkada · 2024-03-06T10:18:14Z

Hi @BramVanroy
Yes this is still the plan ! please see: #990 for more details

BramVanroy · 2024-03-06T13:53:20Z

@younesbelkada Good to hear, thanks!

arlo-phoenix mentioned this issue Sep 8, 2023

Add ROCm support #756

Closed

Airradda mentioned this issue Oct 23, 2023

Bitsandbytes dos not support ROCm nixified-ai/flake#56

Open

github-actions bot closed this as completed Dec 29, 2023

younesbelkada reopened this Dec 29, 2023

bitsandbytes-foundation deleted a comment from github-actions bot Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitsandbytes-rocm rocm-5.6.0 support #681

bitsandbytes-rocm rocm-5.6.0 support #681

ccbadd commented Aug 4, 2023

arlo-phoenix commented Aug 5, 2023

hameerabbasi commented Aug 7, 2023

arlo-phoenix commented Aug 8, 2023 •

edited

Loading

nonetrix commented Aug 30, 2023 •

edited

Loading

nonetrix commented Aug 30, 2023

arlo-phoenix commented Sep 8, 2023

BramVanroy commented Mar 6, 2024

younesbelkada commented Mar 6, 2024

BramVanroy commented Mar 6, 2024

bitsandbytes-rocm rocm-5.6.0 support #681

bitsandbytes-rocm rocm-5.6.0 support #681

Comments

ccbadd commented Aug 4, 2023

arlo-phoenix commented Aug 5, 2023

hameerabbasi commented Aug 7, 2023

arlo-phoenix commented Aug 8, 2023 • edited Loading

nonetrix commented Aug 30, 2023 • edited Loading

nonetrix commented Aug 30, 2023

arlo-phoenix commented Sep 8, 2023

BramVanroy commented Mar 6, 2024

younesbelkada commented Mar 6, 2024

BramVanroy commented Mar 6, 2024

arlo-phoenix commented Aug 8, 2023 •

edited

Loading

nonetrix commented Aug 30, 2023 •

edited

Loading