cuBLAS API failed with status 15 #569

tarunchand · 2023-03-25T18:49:00Z

Describe the bug

(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ ls models/alpaca-native
added_tokens.json generation_config.json pytorch_model-00002-of-00003.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.model training_args.bin
config.json pytorch_model-00001-of-00003.bin pytorch_model-00003-of-00003.bin README.md tokenizer_config.json trainer_state.json

(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
warnings.warn(value)
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
cuBLAS API failed with status 15
A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))
Exception in thread Thread-4:
error detectedTraceback (most recent call last):
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generate
return self.sample(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample
outputs = self(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
outputs = self.model(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
layer_outputs = decoder_layer(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
^A^A^A

Is there an existing issue for this?

I have searched the existing issues

Reproduction

My Laptop Specs:- RTX 3060 6GB VRAM, 16GB RAM
Tried loading alpaca-native 7b model in 8bit mode with auto-devices option(since it requires at least 8GB VRAM to completely load it on GPU).
Model has loaded successfully.
Got this error after opening webui and after typing any message in the chat box

Screenshot

No response

Logs

(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ ls models/alpaca-native     
added_tokens.json  generation_config.json            pytorch_model-00002-of-00003.bin  pytorch_model.bin.index.json  special_tokens_map.json  tokenizer.model     training_args.bin
config.json        pytorch_model-00001-of-00003.bin  pytorch_model-00003-of-00003.bin  README.md                     tokenizer_config.json    trainer_state.json
                                                                                                                                                                                                                                              
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat                
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
cuBLAS API failed with status 15
A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))
Exception in thread Thread-4:
error detectedTraceback (most recent call last):
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generate
    return self.sample(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample
    outputs = self(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
    raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
^A^A^A

System Info

Linux, RTX 3060 - 6GB VRAM, i7, 16GB RAM

The text was updated successfully, but these errors were encountered:

nsteve-one · 2023-04-03T23:34:25Z

I am having the same issue. WSL 2, RTX 3060, i9, 32GB RAM

github-actions · 2023-11-27T23:16:48Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

tarunchand added the bug Something isn't working label Mar 25, 2023

NanoCode012 mentioned this issue Jun 24, 2023

[Bug] Exception: cublasLt ran into an error! during fine-tuning LLM in 8bit mode bitsandbytes-foundation/bitsandbytes#538

Open

github-actions bot added the stale label Nov 27, 2023

github-actions bot closed this as completed Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuBLAS API failed with status 15 #569

cuBLAS API failed with status 15 #569

tarunchand commented Mar 25, 2023

nsteve-one commented Apr 3, 2023

github-actions bot commented Nov 27, 2023

cuBLAS API failed with status 15 #569

cuBLAS API failed with status 15 #569

Comments

tarunchand commented Mar 25, 2023

Describe the bug

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

nsteve-one commented Apr 3, 2023

github-actions bot commented Nov 27, 2023

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues