Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuBLAS API failed with status 15 #569

Closed
1 task done
tarunchand opened this issue Mar 25, 2023 · 2 comments
Closed
1 task done

cuBLAS API failed with status 15 #569

tarunchand opened this issue Mar 25, 2023 · 2 comments
Labels
bug Something isn't working stale

Comments

@tarunchand
Copy link

Describe the bug

(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ ls models/alpaca-native
added_tokens.json generation_config.json pytorch_model-00002-of-00003.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.model training_args.bin
config.json pytorch_model-00001-of-00003.bin pytorch_model-00003-of-00003.bin README.md tokenizer_config.json trainer_state.json

(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
warnings.warn(value)
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
cuBLAS API failed with status 15
A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))
Exception in thread Thread-4:
error detectedTraceback (most recent call last):
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generate
return self.sample(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample
outputs = self(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
outputs = self.model(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
layer_outputs = decoder_layer(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
^A^A^A

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

My Laptop Specs:- RTX 3060 6GB VRAM, 16GB RAM
Tried loading alpaca-native 7b model in 8bit mode with auto-devices option(since it requires at least 8GB VRAM to completely load it on GPU).
Model has loaded successfully.
Got this error after opening webui and after typing any message in the chat box

Screenshot

No response

Logs

(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ ls models/alpaca-native     
added_tokens.json  generation_config.json            pytorch_model-00002-of-00003.bin  pytorch_model.bin.index.json  special_tokens_map.json  tokenizer.model     training_args.bin
config.json        pytorch_model-00001-of-00003.bin  pytorch_model-00003-of-00003.bin  README.md                     tokenizer_config.json    trainer_state.json
                                                                                                                                                                                                                                              
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat                
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
cuBLAS API failed with status 15
A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))
Exception in thread Thread-4:
error detectedTraceback (most recent call last):
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generate
    return self.sample(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample
    outputs = self(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
    raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
^A^A^A

System Info

Linux, RTX 3060 - 6GB VRAM, i7, 16GB RAM
@tarunchand tarunchand added the bug Something isn't working label Mar 25, 2023
@nsteve-one
Copy link

I am having the same issue. WSL 2, RTX 3060, i9, 32GB RAM

Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants