ROCm and 8-bit quantization #1245

DavideRossi · 2024-06-06T11:06:21Z

System Info

An AMD Epyc system with 3 MI210.
Quite a complex setup. The system uses slurm to schedule batch jobs which are usually in the form of apptainer run containers. The image I'm using has rocm6.0.2 on ubuntu22.04.

Reproduction

python -m bitsandbytes

CUDA specs: CUDASpecs(highest_compute_capability=(9, 0), cuda_version_string='61', cuda_version_tuple=(6, 1))
PyTorch settings found: CUDA_VERSION=61, Highest Compute Capability: (9, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.

Two issues here: CUDA_VERSION here is not 61, that's the ROCm version (6.1), the cuda version is the hell knows what since torch.version.cuda is None on ROCm.
As a result the "lower than 11" makes little sense in this case.
Second issue: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx leads nowhere.
That leaves me wondering whether 8-bit on ROCm is really supported or not.

OK, let's try to run some code then:

model = AutoModelForCausalLM.from_pretrained(checkpoint, attn_implementation="eager", quantization_config=BitsAndBytesConfig(load_in_8bit=True))
outputs = model.generate(inputs)

Result:

[...]
Exception: cublasLt ran into an error!

See #538.
But now the question is: it's really the case that the existing 8-bit code is not supported on ROCm, or is it a case of architecture/libraries mismatch and 8-bit could actually work?

Expected behavior

This might be a bug, or it might not. I've not been able to find specific documentation on this. It seems to me like it's possible that 8 bit quantization could actually work but the code to detect if the architecture is supported has issues. Or it may be the case that I can forget about 8 bit on ROCm. But at least I would know it for sure.

The text was updated successfully, but these errors were encountered:

mohamedyassin1 · 2024-06-20T01:02:43Z

Hi @DavideRossi , I had similar errors, but 8 bit quantization is working for me on ROCm now. I have added a comment with steps I took in the bitsandbytes multi-backend-refactor discussion post with more details. Hope this helps.

DavideRossi · 2024-06-20T09:21:21Z

Thanks @mohamedyassin1 what you describe is very similar to my own setup. Can I ask you to paste the output of python -m bitsandbytes from your system?

mohamedyassin1 · 2024-06-20T15:38:45Z

Thanks @mohamedyassin1 what you describe is very similar to my own setup. Can I ask you to paste the output of python -m bitsandbytes from your system?

Sure:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='60', cuda_version_tuple=(6, 0))
PyTorch settings found: CUDA_VERSION=60, Highest Compute Capability: (11, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!

DavideRossi · 2024-06-20T19:04:08Z

That's interesting. It says highest_compute_capability=(11, 0) whereas my output says highest_compute_capability=(9, 0). An NVidia hardware this fully depends on the GPU model, on ROCm I have no idea if it only depends on the hardware or also on the HIP/ROCm version...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm and 8-bit quantization #1245

ROCm and 8-bit quantization #1245

DavideRossi commented Jun 6, 2024 •

edited

Loading

mohamedyassin1 commented Jun 20, 2024

DavideRossi commented Jun 20, 2024

mohamedyassin1 commented Jun 20, 2024 •

edited

Loading

DavideRossi commented Jun 20, 2024

ROCm and 8-bit quantization #1245

ROCm and 8-bit quantization #1245

Comments

DavideRossi commented Jun 6, 2024 • edited Loading

System Info

Reproduction

Expected behavior

mohamedyassin1 commented Jun 20, 2024

DavideRossi commented Jun 20, 2024

mohamedyassin1 commented Jun 20, 2024 • edited Loading

DavideRossi commented Jun 20, 2024

DavideRossi commented Jun 6, 2024 •

edited

Loading

mohamedyassin1 commented Jun 20, 2024 •

edited

Loading