-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Exception: cublasLt ran into an error! during fine-tuning LLM in 8bit mode #538
Comments
I have the same issue - it occurs when running an 8bit model in the following docker container
|
+1ing this. I notice it with local conda on H100 lambdalabs. Although I'm unsure whether this is a bitsandbytes error or something to do with CUDA for the H100s. |
+1 |
Trying to run today on a H100 instance, confirmed installation of 0.40.1 that I saw that was supposed to work now with this GPU,
So frustrating... |
Same error for me |
Hello, any news? Same error here, I cannot find anything useful in order to use the 8 bit quantization on the H100 GPUs. |
Hi @TimDettmers Do we have the fix yet? |
@basteran Did you find the fix? @TimDettmers Any updates? |
are there any updates here? am I missing something or did they just "forget" to support H100 GPUs and even months later this hasn't been fixed? has anyone found a workaround? @TimDettmers ? |
This is actually a more complicated issue. The 8-bit implementation uses cuBLASLt which uses special format for 8-bit matrix multiplication. There are special formats for Ampere, Turning, and now Hopper GPUs. Hopper GPUs do not support Ampere or Turing formats. This means multiple CUDA kernels and the cuBLASLt integration need to be implemented to make 8-bit work on Hopper GPUs. I think for now, the more realistic thing is to throw and error to let the user know that this features is currently not supported. |
Bitsandbytes was not supported windows before, but my method can support windows.(yuhuang) 3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows 4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310) |
OR you are Linux distribution (Ubuntu, MacOS, etc.)system ,AND CUDA Version: 11.X. Bitsandbytes can support ubuntu.(yuhuang) 3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows 4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/TimDettmers/bitsandbytes/releases/download/0.41.0/bitsandbytes-0.41.0-py3-none-any.whl Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
@TimDettmers could you use https://github.com/NVIDIA/TransformerEngine ? At the first sight the exposed API seems too high-level for your needs, but their building blocks are tailored for Hopper (H100) and Ada (RTX4090) architectures, e.g. https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/common/gemm/cublaslt_gemm.cu |
This error is related to H100, I tried loading the model on H100 and got the error, the same load8bit was tried on A100 and it's working fine. |
Anyone able to resolve this? |
Is still not available on H100 GPU instance? |
Not yet unfortunately |
do you guys have some solution for this? |
Observing the same issue with H100, too. |
Also with H800. |
Any plan to fix this? |
The same problem comes for H20 |
The same with H800 |
Hi all, I will keep this issue open, but please be aware that for now that 8bit is not supported in bitsandbytes on Hopper. It is recommended to use nf4 or fp4 instead. |
Just want to add to this thread. Tried in H100 and not working. really hope bitesandbytes team and support this feature given that more and more ppl is gonna switch to newer version GPUs |
Same to me. Not work after changing to bf16, fp16, fp4, or else. |
Having same issue with H100E |
Same problem |
The same with H800 and H100 |
Still having the same issue |
Still having the same issue on H100 |
Still having same issue on H100 |
Well, just came here to say I also ran into this issue using 8bit and H100. Would be very useful to have this working! |
Hi all! We are currently working on LLM.int8 support for Hopper in PR #1401. I cannot give an accurate ETA for a release at the moment, but it will be supported soon! |
same problem occurred |
Would be very appreciated to have this working on H100. |
Still get the same problem with H100. |
Problem
Hello, I'm getting this weird cublasLt error on a lambdalabs H100 with cuda 118, pytorch 2.0.1, python3.10 Miniconda while trying to fine-tune a 3B param open-llama using LORA with 8bit loading. This only happens if we turn on 8bit loading. Lora alone or 4bit loading (qlora) works.
The same commands did work 2 weeks ago and stopped working a week ago.
I've tried bitsandbytes version 0.39.0 and 0.39.1 as prior versions don't work with H100. Source gives me a different issue as mentioned in Env section.
Expected
No error
Reproduce
Setup Miniconda then follow https://github.com/OpenAccess-AI-Collective/axolotl 's readme on lambdalabs and run the default open llama lora config.
Trace
0.39.0
Env
python -m bitsandbytes
on main branch: I get error same as here Error named symbol not found at line 116 in file bitsandbytes/csrc/ops.cu #382
on 0.39.0
Misc
All related issues:
Also tried install
cudatoolkit
via conda.The text was updated successfully, but these errors were encountered: