-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: ROCm support (AMD GPU) #107
Comments
Amazing! Thank you for bringing this to my attention. I will try to get in touch with the author of the ROCm library and support AMD GPUs by default. |
that would be AMAZING! especially with you recently adding 8 bit support. I tried to make my own merge of the forks but I don't really know what I'm doing and don't think I did it correctly |
If the ROCm fork does get merged in, would the Int8 Matmul compatibility improvements also work for AMD GPUs? |
@TimDettmers, curious if AMD support any nearer to being merged? @agrocylo made a PR (#296) based somewhat on @broncotc's fork... |
EDIT: A slightly newer version branched from v0.37 available here: |
The Wikimedia foundation is really interested in the ROCm support too, since Nvidia is not viable for us due to open-source constraints. @TimDettmers we offer any help (testing/review/etc..) to help merge this feature, it would be really great for the ML open source ecosystem. Thanks in advance! |
Hi, File "/home/.local/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in <module>
from ._functions import undo_layout, get_inverse_transform_indices
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in <module>
import bitsandbytes.functional as F
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in <module>
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 74, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues I use AMD MI200 card. |
Hello, I was wondering how far-off the ROCm support is. I'm trying to see if my 7900XTX will be useful in a project of mine. The Llama2 quick start guide makes use of bitsandbytes, and as far as I know there isn't any other alternatives. |
Found this rocm version of bitsandbytes: https://github.com/Lzy17/bitsandbytes-rocm/tree/main |
The only rocm version that worked for me on GFX900 was this one: https://github.com/agrocylo/bitsandbytes-rocm |
For anyone that needs a patch for This fork patches the Makefile for targeting Works with a RX7900XT and ROCM5.7 (along with torch-rocm5.7) installed. Anyway there should be a better way of targeting the correct amdgpu module in the build system... Edit: Probably won't work with libraries requiring version > 0.35 |
@st1vms There is a problem. |
If that fork still works for you, maybe it is ok to just change the version number. You can test if the library works with:
If that is the case, try editing the version number in the |
@st1vms I tried BNB 0.39.0 The Jupyter kernel crash, reason: undefined |
Well, the fork is probably obsolete already for some libraries, you should look for updated ones. |
@st1vms (torch3) win@win-MS-7E02:/mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/Learn$ /usr/bin/env /home/win/torch3/bin/python /home/win/.vscode-oss/extensions/ms-python.python-2023.20.0-universal/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 60843 -- /mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/Learn/finetune.py =============================================
|
Can someone post to this thread any updated forks? The lack of proper BnB support is really holding back the AMD cards. |
Looks like things may finally move forward with official support in the not too distant future! Hope with ROCm 6.x we can finally see support merged into this repo. |
Sorry for taking so long on this. I am currently onboarding more maintainers and we should see some progress on this very soon. This is one of our high-priority issues. |
Would love to see ROCM support, keep doing your good work |
if I may ask, what's the progress so far? |
If you haven't already seen it, there was a comment made in the discussions with an accompanying tracking issue for general cross-platform support rather than just AMD/ROCM support. To that end it appears it is currently in the planning phase. |
@TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should be seriously considering upstreaming. Could you drop me an email at [email protected], and we can set up a call to discuss further. |
@amathews-amd I tired compiling ROCm version of BnB from the rocm_enabled branch, but it is failing with errors on AMD MI250x. Do you have any suggestions for how to resolve the issue? |
@chauhang Could you try with rocm 6.0? You can use this docker - rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2 and install bitsandbytes directly. |
@chauhang, you can skip the hipblaslt update and install bitsandbytes directly then. Please let me know if you face any issues. |
I was using arlo-phoenix fork. https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm Should I use the ROCm fork instead? https://github.com/ROCm/bitsandbytes/tree/rocm_enabled |
Yes, its updated for rocm 6 |
I've often had trouble understanding the state of GPU support in ROCm. So with that said, I have some clarification questions:
I'd like to be able to help get this merged, but need to figure out the constraints. The only AMD GPUs that I have on hand (RX 570 and R9 270X) aren't going to cut it. The other issue is how far behind |
Sure, here is the official list: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus. For BnB, since we are at initial enablement, it is dependent on where we are testing it (both hardware and software versions). We are currently focusing on MI250/MI300/gfx1100, and newer ROCm versions for testing. |
@amathews-amd I sent you an invite to our bnb-crossplatform slack, to the email you provided. Of course we should invite your other collaborators as well. Can we talk there and coordinate on scheduling a kickoff call? |
@amathews-amd the changes introduced through #898 are not final and weren't merged onto This means that there's ongoing work where a series of PRs onto |
is there a place where we can track the progress on the implementation of this? |
by the way, does anyone know where I can submit bug reports for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled? going to the page, there's no Issues tab. |
Maybe @pnunna93 or @amathews-amd from AMD can help with that? I'm sure they'd appreciate your report.
Right now the best place is to look at PRs and recently merged PRs to the We should make significant progress in the next weeks and make a alpha/beta release built off of that branch available as a nightly package release relatively soon. |
We created issues tab - https://github.com/ROCm/bitsandbytes/issues , please feel free to open any bug reports. |
We hope that BNB, which operates in the ROCm environment, will be officially released as soon as possible. For a few days, I did an example of fine tuning in the Redeon XTX7900 + ROCm 6.1.2 environment, but the issue of BNB not being recognized really gave me a headache. I felt like this was why everyone was buying Nvidia graphics cards. I saw this section and proceeded with it, but the BNB included in that section was not recognized properly.
success code
fail code
|
Hi @katanazero86 , Sorry for the trouble you have faced. The torch version seems to be for rocm 6.0, please install 6.1 torch and rebuild bitsandbytes. pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/ |
Hi @pnunna93,
I guess the versions are not compatible. It's so annoying. T.T |
Hi @katanazero86, Sorry for the trouble. I have tested your code within this Docker image (docker pull rocm/rocm-terminal:6.1.2), and both runs executed without error. Could you try using this Docker image if possible? Additionally, please install PyTorch as suggested by @pnunna93 within the docker container:
Thanks! |
Thank you for answer. When I have time later, I will try again using a Docker image :) |
Could you please add official AMD ROCm support to this library? An unofficial working port already exists:
https://github.com/broncotc/bitsandbytes-rocm
Thank You
The text was updated successfully, but these errors were encountered: