Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm and 8-bit quantization #1245

Open
DavideRossi opened this issue Jun 6, 2024 · 4 comments
Open

ROCm and 8-bit quantization #1245

DavideRossi opened this issue Jun 6, 2024 · 4 comments

Comments

@DavideRossi
Copy link

DavideRossi commented Jun 6, 2024

System Info

An AMD Epyc system with 3 MI210.
Quite a complex setup. The system uses slurm to schedule batch jobs which are usually in the form of apptainer run containers. The image I'm using has rocm6.0.2 on ubuntu22.04.

Reproduction

python -m bitsandbytes

CUDA specs: CUDASpecs(highest_compute_capability=(9, 0), cuda_version_string='61', cuda_version_tuple=(6, 1))
PyTorch settings found: CUDA_VERSION=61, Highest Compute Capability: (9, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.

Two issues here: CUDA_VERSION here is not 61, that's the ROCm version (6.1), the cuda version is the hell knows what since torch.version.cuda is None on ROCm.
As a result the "lower than 11" makes little sense in this case.
Second issue: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx leads nowhere.
That leaves me wondering whether 8-bit on ROCm is really supported or not.

OK, let's try to run some code then:

model = AutoModelForCausalLM.from_pretrained(checkpoint, attn_implementation="eager", quantization_config=BitsAndBytesConfig(load_in_8bit=True))
outputs = model.generate(inputs)

Result:

[...]
Exception: cublasLt ran into an error!

See #538.
But now the question is: it's really the case that the existing 8-bit code is not supported on ROCm, or is it a case of architecture/libraries mismatch and 8-bit could actually work?

Expected behavior

This might be a bug, or it might not. I've not been able to find specific documentation on this. It seems to me like it's possible that 8 bit quantization could actually work but the code to detect if the architecture is supported has issues. Or it may be the case that I can forget about 8 bit on ROCm. But at least I would know it for sure.

@mohamedyassin1
Copy link

Hi @DavideRossi , I had similar errors, but 8 bit quantization is working for me on ROCm now. I have added a comment with steps I took in the bitsandbytes multi-backend-refactor discussion post with more details. Hope this helps.

@DavideRossi
Copy link
Author

Thanks @mohamedyassin1 what you describe is very similar to my own setup. Can I ask you to paste the output of python -m bitsandbytes from your system?

@mohamedyassin1
Copy link

mohamedyassin1 commented Jun 20, 2024

Thanks @mohamedyassin1 what you describe is very similar to my own setup. Can I ask you to paste the output of python -m bitsandbytes from your system?

Sure:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='60', cuda_version_tuple=(6, 0))
PyTorch settings found: CUDA_VERSION=60, Highest Compute Capability: (11, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!

@DavideRossi
Copy link
Author

That's interesting. It says highest_compute_capability=(11, 0) whereas my output says highest_compute_capability=(9, 0). An NVidia hardware this fully depends on the GPU model, on ROCm I have no idea if it only depends on the hardware or also on the HIP/ROCm version...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants