You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.
CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
warnings.warn(value)
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
cuBLAS API failed with status 15
A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))
Exception in thread Thread-4:
error detectedTraceback (most recent call last):
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generate
return self.sample(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample
outputs = self(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
outputs = self.model(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
layer_outputs = decoder_layer(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
^A^A^A
Is there an existing issue for this?
I have searched the existing issues
Reproduction
My Laptop Specs:- RTX 3060 6GB VRAM, 16GB RAM
Tried loading alpaca-native 7b model in 8bit mode with auto-devices option(since it requires at least 8GB VRAM to completely load it on GPU).
Model has loaded successfully.
Got this error after opening webui and after typing any message in the chat box
Screenshot
No response
Logs
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ ls models/alpaca-native
added_tokens.json generation_config.json pytorch_model-00002-of-00003.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.model training_args.bin
config.json pytorch_model-00001-of-00003.bin pytorch_model-00003-of-00003.bin README.md tokenizer_config.json trainer_state.json
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
warnings.warn(value)
Running on local URL: http://127.0.0.1:7860
To create a public link, set`share=True`in`launch()`./home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn(cuBLAS API failed with status 15A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))Exception in thread Thread-4:error detectedTraceback (most recent call last): File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_innerself.run() File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask ret = self.mfunc(callback=_callback, **self.kwargs) File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback shared.model.generate(**kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generatereturn self.sample( File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample outputs = self( File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_implreturn forward_call(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward outputs = self.model( File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_implreturn forward_call(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward layer_outputs = decoder_layer( File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_implreturn forward_call(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_implreturn forward_call(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_implreturn forward_call(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmulreturn MatMul8bitLt.apply(A, B, out, bias, state) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in applyreturnsuper().apply(*args, **kwargs) # type: ignore[misc] File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB) File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt raise Exception('cublasLt ran into an error!')Exception: cublasLt ran into an error!^A^A^A
System Info
Linux, RTX 3060 - 6GB VRAM, i7, 16GB RAM
The text was updated successfully, but these errors were encountered:
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ ls models/alpaca-native
added_tokens.json generation_config.json pytorch_model-00002-of-00003.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.model training_args.bin
config.json pytorch_model-00001-of-00003.bin pytorch_model-00003-of-00003.bin README.md tokenizer_config.json trainer_state.json
(gpt) ┌──(me㉿x86_64-conda-linux-gnu)-[~/GPT/text-generation-webui]
└─$ python server.py --model alpaca-native --load-in-8bit --auto-devices --cai-chat
Loading alpaca-native...
Auto-assiging --gpu-memory 5 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /home/me/anaconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:49<00:00, 96.60s/it]
Loaded the model in 297.22 seconds.
Loading the extension "gallery"... Ok.
/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
warnings.warn(value)
Running on local URL: http://127.0.0.1:7860
To create a public link, set
share=True
inlaunch()
./home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
cuBLAS API failed with status 15
A: torch.Size([383, 4096]), B: torch.Size([4096, 4096]), C: (383, 4096); (lda, ldb, ldc): (c_int(12256), c_int(131072), c_int(12256)); (m, n, k): (c_int(383), c_int(4096), c_int(4096))
Exception in thread Thread-4:
error detectedTraceback (most recent call last):
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/me/anaconda3/envs/gpt/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/me/GPT/text-generation-webui/modules/callbacks.py", line 65, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/me/GPT/text-generation-webui/modules/text_generation.py", line 215, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 1462, in generate
return self.sample(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/generation/utils.py", line 2478, in sample
outputs = self(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
outputs = self.model(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
layer_outputs = decoder_layer(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
^A^A^A
Is there an existing issue for this?
Reproduction
My Laptop Specs:- RTX 3060 6GB VRAM, 16GB RAM
Tried loading alpaca-native 7b model in 8bit mode with auto-devices option(since it requires at least 8GB VRAM to completely load it on GPU).
Model has loaded successfully.
Got this error after opening webui and after typing any message in the chat box
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: