cuBLAS API failed with status 15 - Error #174

rmivdc · 2023-03-26T20:53:28Z

Hi,
During the finetune.py command launch i'm encoutering this error titled above.
i'm using Fedora 36 with Cuda12, Python 3.10.10, initializing seems begining like so :

CUDA SETUP: CUDA runtime path found: /usr/local/cuda-12.0/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 120

and then later after loading some files :

Loading cached split indices for dataset at /home/rmivdc/.cache/huggingface/datasets/json/default-fac87d4e05e14783/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e521db28b6879419.arrow and /home/rmivdc/.cache/huggingface/datasets/json/default-fac87d4e05e14783/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-eb712e2459ca28b6.arrow
/home/rmivdc/.local/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/1170 [00:00<?, ?it/s]cuBLAS API failed with status 15
A: torch.Size([2048, 4096]), B: torch.Size([4096, 4096]), C: (2048, 4096); (lda, ldb, ldc): (c_int(65536), c_int(131072), c_int(65536)); (m, n, k): (c_int(2048), c_int(4096), c_int(4096))

am i using some wrong libs versions ?
thx for your help

The text was updated successfully, but these errors were encountered:

loganlebanoff · 2023-03-27T13:48:16Z

I ran into this issue as well with torch==2.0. When I uninstalled it and re-installed as torch==1.13.1, then it seemed to fix the issue.

rmivdc · 2023-03-27T18:37:01Z

Thanks ! this version fixed it.
EDIT : at least for cpu running, gpu running still throws that error

loganlebanoff · 2023-03-27T19:03:09Z

The error went away for me on GPU

rmivdc · 2023-03-27T19:47:16Z

The error went away for me on GPU

May i know what Cuda version are you using / nvidia drivers version and your :

accelerate
appdirs
bitsandbytes
black
black[jupyter]
datasets
fire
gradio

pip packages versions ? (if not last one used)

thanks !

loganlebanoff · 2023-03-27T20:22:46Z

CUDA 11.7. Also I'm used conda for install pytorch with cuda (conda install pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia)

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

$ nvidia-smi                                                                                                                                                        Mon Mar 27 20:19:20 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB           On | 00000000:05:00.0 Off |                    0 |
| N/A   29C    P0               63W / 400W|   7429MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB           On | 00000000:06:00.0 Off |                    0 |
| N/A   26C    P0               63W / 400W|   7717MiB / 81920MiB |      0%      Default |

$ pip list
Package             Version     Editable project location
------------------- ----------- ------------------------------
accelerate          0.18.0
aiofiles            23.1.0
aiohttp             3.8.4
aiosignal           1.3.1
altair              4.2.2
anyio               3.6.2
appdirs             1.4.4
async-timeout       4.0.2
attrs               22.2.0
bitsandbytes        0.37.2
certifi             2022.12.7
charset-normalizer  3.1.0
click               8.1.3
contourpy           1.0.7
cycler              0.11.0
datasets            2.10.1
deepspeed           0.8.3
defusedxml          0.7.1
dill                0.3.6
entrypoints         0.4
fastapi             0.95.0
ffmpy               0.3.0
filelock            3.10.6
fire                0.5.0
flit_core           3.8.0
fonttools           4.39.2
frozenlist          1.3.3
fsspec              2023.3.0
Glances             3.3.1.1
gradio              3.23.0
h11                 0.14.0
hjson               3.1.0
httpcore            0.16.3
httptools           0.5.0
httpx               0.23.3
huggingface-hub     0.13.3
idna                3.4
importlib-resources 5.12.0
Jinja2              3.1.2
jmespath            1.0.1
jsonschema          4.17.3
kiwisolver          1.4.4
linkify-it-py       2.0.0
loralib             0.1.1
markdown-it-py      2.2.0
MarkupSafe          2.1.2
matplotlib          3.7.1
mdit-py-plugins     0.3.3
mdurl               0.1.2
multidict           6.0.4
multiprocess        0.70.14
ninja               1.11.1
numpy               1.24.2
openai              0.27.2
orjson              3.8.8
packaging           23.0
pandas              1.5.3
peft                0.3.0.dev0  /home/fsuser/peft
Pillow              9.4.0
pip                 23.0.1
psutil              5.9.4
py-cpuinfo          9.0.0
pyarrow             11.0.0
pydantic            1.10.7
pydub               0.25.1
pyparsing           3.0.9
pyrsistent          0.19.3
python-dateutil     2.8.2
python-dotenv       1.0.0
python-multipart    0.0.6
pytz                2023.2
PyYAML              6.0
regex               2023.3.23
requests            2.28.2
responses           0.18.0
rfc3986             1.5.0
semantic-version    2.10.0
sentencepiece       0.1.97
setuptools          65.6.3
six                 1.16.0
sniffio             1.3.0
starlette           0.26.1
termcolor           2.2.0
tokenizers          0.13.2
toolz               0.12.0
torch               1.13.1
tqdm                4.65.0
transformers        4.28.0.dev0 /home/fsuser/transformers_main
typing_extensions   4.4.0
uc-micro-py         1.0.1
ujson               5.7.0
urllib3             1.26.15
uvicorn             0.21.1
uvloop              0.17.0
watchfiles          0.18.1
websockets          10.4
wheel               0.38.4
xxhash              3.2.0
yarl                1.8.2
zipp                3.15.0

leehanchung · 2023-04-01T04:54:52Z

CUDA 12 is not compatible with PyTorch 2.0.

https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix

Following is the Release Compatibility Matrix for PyTorch releases:

PyTorch version	Python	Stable CUDA	Experimental CUDA
2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96
1.12	>=3.7, <=3.10	CUDA 11.3, CUDNN 8.3.2.44	CUDA 11.6, CUDNN 8.3.2.44

Also, Python 3.11 is not compatible either; the max version is 3.10.

mudomau · 2023-04-02T01:26:05Z

Getting the same issue here trying to run inference on the google t5-xl model.

Error:

cuBLAS API failed with status 15
A: torch.Size([1, 2048]), B: torch.Size([2048, 2048]), C: (1, 2048); (lda, ldb, ldc): (c_int(32), c_int(65536), c_int(32)); (m, n, k): (c_int(1), c_int(2048), c_int(2048))
...
 File "/home/mau/.conda/envs/test/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/home/mau/.conda/envs/test/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
    raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!

I've tried all the fixes proposed here but no luck.

Environment packages:

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
accelerate 0.18.0 pypi_0 pypi
bitsandbytes 0.37.2 pypi_0 pypi
blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.01.10 h06a4308_0 anaconda
certifi 2022.12.7 py39h06a4308_0 anaconda
cffi 1.15.1 py39h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
cryptography 39.0.1 py39h9ce1e76_0
cuda-cudart 11.7.99 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-runtime 11.7.1 0 nvidia
cudatoolkit 11.3.1 h2bc3f7f_2 anaconda
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.10.7 pypi_0 pypi
flit-core 3.8.0 py39h06a4308_0
freetype 2.12.1 h4a9f257_0
giflib 5.2.1 h5eee18b_3
gmp 6.2.1 h295c915_3
gnutls 3.6.15 he1e5248_0
huggingface-hub 0.13.3 pypi_0 pypi
idna 3.4 py39h06a4308_0
intel-openmp 2021.4.0 h06a4308_3561
jpeg 9e h5eee18b_1
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 11.10.3.66 0 nvidia
libcufft 10.7.2.124 h4fbf590_0 nvidia
libcufile 1.6.0.25 0 nvidia
libcurand 10.3.2.56 0 nvidia
libcusolver 11.4.0.1 0 nvidia
libcusparse 11.7.4.91 0 nvidia
libdeflate 1.17 h5eee18b_0
libffi 3.4.2 h6a678d5_6
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.2 h7f8727e_0
libnpp 11.7.4.75 0 nvidia
libnvjpeg 11.8.0.2 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.16.0 h27cfd23_0
libtiff 4.5.0 h6a678d5_2
libunistring 0.9.10 h27cfd23_0
libwebp 1.2.4 h11a3e52_1
libwebp-base 1.2.4 h5eee18b_1
lz4-c 1.9.4 h6a678d5_0
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.1 py39hd3c417c_0
mkl_random 1.2.2 py39h51133e4_0
ncurses 6.4 h6a678d5_0
nettle 3.7.3 hbbd107a_1
numpy 1.23.5 py39h14f4228_0
numpy-base 1.23.5 py39h31eccc5_0
openh264 2.1.1 h4ff587b_0
openssl 1.1.1t h7f8727e_0
packaging 23.0 pypi_0 pypi
pillow 9.4.0 py39h6a678d5_0
pip 23.0.1 py39h06a4308_0
psutil 5.9.4 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 23.0.0 py39h06a4308_0
pysocks 1.7.1 py39h06a4308_0
python 3.9.16 h7a1cb2a_2
pytorch 1.13.1 py3.9_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_3 pytorch
pytorch-mutex 1.0 cuda pytorch
pyyaml 6.0 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2023.3.23 pypi_0 pypi
requests 2.28.1 py39h06a4308_1
sentencepiece 0.1.97 pypi_0 pypi
setuptools 65.6.3 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.41.1 h5eee18b_0
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.2 pypi_0 pypi
torchaudio 0.13.1 py39_cu117 pytorch
torchvision 0.14.1 py39_cu117 pytorch
tqdm 4.65.0 pypi_0 pypi
transformers 4.28.0.dev0 pypi_0 pypi
typing_extensions 4.4.0 py39h06a4308_0
tzdata 2022g h04d1e81_0
urllib3 1.26.15 py39h06a4308_0
wheel 0.38.4 py39h06a4308_0
xz 5.2.10 h5eee18b_1
zlib 1.2.13 h5eee18b_0
zstd 1.5.4 hc292b87_0

rmivdc · 2023-04-02T16:35:03Z

@mudomau
Do you have the same issue with "decapoda-research/llama-7b-hf" ?

I'm encountering another error now but the last Dockerfile install uploaded 3 days ago fixed that cuBLAS error for me.

samuelcardoso · 2023-04-12T00:30:32Z

same problem here.

trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
A: torch.Size([5120, 4096]), B: torch.Size([4096, 4096]), C: (5120, 4096); (lda, ldb, ldc): (c_int(163840), c_int(131072), c_int(163840)); (m, n, k): (c_int(5120), c_int(4096), c_int(4096))
cuBLAS API failed with status 15
error detected

$ nvidia-smi
Tue Apr 11 21:25:11 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:08:00.0  On |                  N/A |
|  0%   53C    P8    18W / 220W |   1020MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1292      G   /usr/lib/xorg/Xorg                460MiB |
|    0   N/A  N/A      1577      G   /usr/bin/gnome-shell              172MiB |
|    0   N/A  N/A      3884      G   ...RendererForSitePerProcess       86MiB |
|    0   N/A  N/A      5441      G   ...983706979455292193,131072      249MiB |
+-----------------------------------------------------------------------------+

$ /usr/local/cuda-11.6/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

$ pip list
Package                  Version
------------------------ -------------
accelerate               0.18.0
aiofiles                 23.1.0
aiohttp                  3.8.4
aiosignal                1.3.1
altair                   4.2.2
anyio                    3.6.2
appdirs                  1.4.4
apturl                   0.5.2
asttokens                2.2.1
async-timeout            4.0.2
attrs                    22.2.0
backcall                 0.2.0
bitsandbytes             0.37.2
black                    23.3.0
blinker                  1.4
Brlapi                   0.8.3
certifi                  2020.6.20
chardet                  4.0.0
charset-normalizer       3.1.0
click                    8.0.3
cmake                    3.26.3
colorama                 0.4.4
command-not-found        0.3
contourpy                1.0.7
cryptography             3.4.8
cupshelpers              1.0
cycler                   0.11.0
datasets                 2.11.0
dbus-python              1.2.18
decorator                5.1.1
defer                    1.0.6
dill                     0.3.6
distro                   1.7.0
distro-info              1.1build1
entrypoints              0.4
executing                1.2.0
fastapi                  0.95.0
ffmpy                    0.3.0
filelock                 3.11.0
fire                     0.5.0
fonttools                4.39.3
frozenlist               1.3.3
fsspec                   2023.4.0
GPUtil                   1.4.0
gradio                   3.25.0
gradio_client            0.0.10
h11                      0.14.0
httpcore                 0.17.0
httplib2                 0.20.2
httpx                    0.24.0
huggingface-hub          0.13.4
idna                     3.3
importlib-metadata       4.6.4
ipython                  8.12.0
jedi                     0.18.2
jeepney                  0.7.1
Jinja2                   3.1.2
jsonschema               4.17.3
keyring                  23.5.0
kiwisolver               1.4.4
language-selector        0.1
launchpadlib             1.10.16
lazr.restfulclient       0.14.4
lazr.uri                 1.0.6
linkify-it-py            2.0.0
lit                      16.0.1
llvmlite                 0.39.1
loralib                  0.1.1
louis                    3.20.0
macaroonbakery           1.3.1
markdown-it-py           2.2.0
MarkupSafe               2.1.2
matplotlib               3.7.1
matplotlib-inline        0.1.6
mdit-py-plugins          0.3.3
mdurl                    0.1.2
more-itertools           8.10.0
mpmath                   1.3.0
multidict                6.0.4
multiprocess             0.70.14
mypy-extensions          1.0.0
netifaces                0.11.0
networkx                 3.1
numba                    0.56.4
numpy                    1.23.5
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
oauthlib                 3.2.0
olefile                  0.46
orjson                   3.8.10
packaging                23.0
pandas                   2.0.0
parso                    0.8.3
pathspec                 0.11.1
peft                     0.3.0.dev0
pexpect                  4.8.0
pickleshare              0.7.5
Pillow                   9.0.1
pip                      22.0.2
platformdirs             3.2.0
prompt-toolkit           3.0.38
protobuf                 3.12.4
psutil                   5.9.4
ptyprocess               0.7.0
pure-eval                0.2.2
pyarrow                  11.0.0
pycairo                  1.20.1
pycups                   2.0.1
pydantic                 1.10.7
pydub                    0.25.1
Pygments                 2.15.0
PyGObject                3.42.1
PyJWT                    2.3.0
pymacaroons              0.13.0
PyNaCl                   1.5.0
pynvml                   11.5.0
pyparsing                2.4.7
pyRFC3339                1.1
pyrsistent               0.19.3
python-apt               2.4.0+ubuntu1
python-dateutil          2.8.2
python-debian            0.1.43ubuntu1
python-multipart         0.0.6
pytz                     2022.1
pyxdg                    0.27
PyYAML                   5.4.1
regex                    2023.3.23
reportlab                3.6.8
requests                 2.25.1
responses                0.18.0
rich                     13.3.3
screen-resolution-extra  0.0.0
SecretStorage            3.3.1
semantic-version         2.10.0
sentencepiece            0.1.97
setuptools               59.6.0
six                      1.16.0
sniffio                  1.3.0
stack-data               0.6.2
starlette                0.26.1
sympy                    1.11.1
systemd-python           234
termcolor                2.2.0
tokenize-rt              5.0.0
tokenizers               0.13.3
tomli                    2.0.1
toolz                    0.12.0
torch                    1.13.1+cu116
torchaudio               0.13.1+cu116
torchvision              0.14.1+cu116
tqdm                     4.65.0
traitlets                5.9.0
transformers             4.28.0.dev0
triton                   2.0.0
typing_extensions        4.5.0
tzdata                   2023.3
ubuntu-advantage-tools   8001
ubuntu-drivers-common    0.0.0
uc-micro-py              1.0.1
ufw                      0.36.1
unattended-upgrades      0.1
urllib3                  1.26.5
uvicorn                  0.21.1
wadllib                  1.3.6
wcwidth                  0.2.6
websockets               11.0.1
wheel                    0.37.1
xdg                      5
xkit                     0.0.0
xxhash                   3.2.0
yarl                     1.8.2
zipp                     1.0.0

arvindsun · 2023-04-15T17:36:02Z

I am running into the same issue as well on a H100:

torch 1.13.1, bitsandbytes==0.38.1, cuda 11.8, python 3.10, cublas 11.11.3.6


    result = super().forward(x)
  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 320, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1436, in igemmlt
    raise Exception('cublasLt ran into an error!')

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

SVEEu · 2023-04-27T02:54:45Z

Same issue comes to me when finetuning 30b and 65b models, even on different clouds.

For 65b model, it randomly occurs with a probability of about 70%. For 30b model, it occurs every time.

Malfaro43 · 2023-05-12T23:08:27Z

I am running into the same issue as well on a H100:

torch 1.13.1, bitsandbytes==0.38.1, cuda 11.8, python 3.10, cublas 11.11.3.6




    result = super().forward(x)

  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 320, in forward

    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)

  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul

    return MatMul8bitLt.apply(A, B, out, bias, state)

  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward

    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)

  File "/home/arvind/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1436, in igemmlt

    raise Exception('cublasLt ran into an error!')


> nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2022 NVIDIA Corporation

Built on Wed_Sep_21_10:33:58_PDT_2022

Cuda compilation tools, release 11.8, V11.8.89

Build cuda_11.8.r11.8/compiler.31833905_0

@arvindsun Have you fixed this? I'm also running into this issue when using an H100 on Lambda Labs.

daniel-furman · 2023-05-23T20:04:49Z

Getting the same error on an H100 on Lambda Labs

jonataslaw · 2023-05-24T18:09:46Z

Getting the same error on an H100 on Lambda Labs too

leehanchung · 2023-05-24T19:00:31Z

Getting the same error on an H100 on Lambda Labs too

Try to run it w/o 8-bit mode since you are on H100

jonataslaw · 2023-05-24T19:43:51Z

Getting the same error on an H100 on Lambda Labs too

Try to run it w/o 8-bit mode since you are on H100

I tried it.

Lambda instances of H100 has cuda 11.8, and pytorch 2.0.1 compiled to 117, which is not compatible. the bitsandbytes version also has a problem, and you need to rename the cuda version you are using.

I tried to install cuda version 12 too, to use the latest version of torch, but strangely the installation is aborted, without fail, so I gave up on testing it on the H100, I had already spent 3h of my time trying to configure it. I'll try it on another runpod instance, as locally I could successfully train it with 3 epochs, but I needed more computation to train it with 10, my RTX4090 will take weeks for it.

zubair-ahmed-ai · 2023-06-05T12:17:49Z

Facing the same error on lambda labds H100 instance trying to load Falcon-40B in 8 bit, what's the solution?

jonataslaw · 2023-06-05T18:57:03Z

export this variables:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Install the compatible cuda (11.7 hasn't support to H100):

sudo apt install cuda-nvcc-11-8 libcusparse-11-8 libcusparse-dev-11-8 libcublas-dev-11-8 libcublas-11-8 libcusolver-dev-11-8 libcusolver-11-8

Remove old cuda:

apt remove cuda-nvcc-11-7

Install the compatible pytorch:

pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu118

pip install pytorch-lightning==1.9.0

If you will use deepspeed to make CPU offload (it makes the train faster) you need:

pip install deepspeed==0.7.0

Edit these files (using VIM, nano, or SFPT) changing the import for inf from torch._six with import from math

/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/utils.py

/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py

Thytu · 2023-06-08T15:54:32Z

Facing the same error on lambda labds H100 instance trying to load Falcon-40B in 8 bit, what's the solution?

Ended up moving back to an A100 😅

daniel-furman · 2023-06-10T17:58:27Z

Has anyone else tried and confirmed the efficacy of @jonataslaw's solution two comments above? Will test myself over the weekend.

daniel-furman · 2023-06-14T17:27:10Z

I was able to solve this error with the conda install approach found here: bitsandbytes-foundation/bitsandbytes#85

# jupyter setup
wget http://repo.continuum.io/archive/Anaconda3-2023.03-1-Linux-x86_64.sh
bash Anaconda3-2023.03-1-Linux-x86_64.sh
source ~/.bashrc

conda create --name cap
conda activate cap
conda install pip
conda install cudatoolkit
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

git clone https://github.com/timdettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x
python setup.py install

pip install scipy
python -m bitsandbytes
# should be successfull build

huawei-lin · 2023-07-24T03:17:16Z

I met this issue on H100 GPU, and fixed it by changing load_in_8bit=True to load_in_8bit=False in the 114-th line of finetune.py.

zubair-ahmed-ai · 2023-08-03T11:08:31Z

@daniel-furman

I was able to solve this error with the conda install approach found here: TimDettmers/bitsandbytes#85

# jupyter setup
wget http://repo.continuum.io/archive/Anaconda3-2023.03-1-Linux-x86_64.sh
bash Anaconda3-2023.03-1-Linux-x86_64.sh
source ~/.bashrc

conda create --name cap
conda activate cap
conda install pip
conda install cudatoolkit
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

git clone https://github.com/timdettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x
python setup.py install

pip install scipy
python -m bitsandbytes
# should be successfull build

Sadly it gave me the below error

Downloading (…)fetensors.index.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36.2k/36.2k [00:00<00:00, 10.6MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.96G/9.96G [03:00<00:00, 55.3MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.86G/9.86G [02:57<00:00, 55.4MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.86G/9.86G [02:57<00:00, 55.4MB/s]
Downloading (…)of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36G/1.36G [00:24<00:00, 55.2MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [09:22<00:00, 140.63s/it]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/ubuntu/miniconda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/home/ubuntu/miniconda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/ubuntu/miniconda/envs/starchat/lib/libcudart.so'), PosixPath('/home/ubuntu/miniconda/envs/starchat/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /home/ubuntu/miniconda/envs/starchat/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 9.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ubuntu/miniconda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards:   0%|                                                                                                                                                 | 0/4 [00:00<?, ?it/s]Error named symbol not found at line 528 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

Jacobsolawetz · 2023-09-11T20:08:35Z

Got this issue on H100 on runpod

HaishuoFang · 2023-09-20T14:21:01Z

same got this on H100 with 8bit. H100 works with 16bits

jieWANGforwork · 2023-10-21T18:34:13Z

Got this error on H100 using 8bit Llama. If anyone can make it on H100?

huawei-lin · 2023-10-21T19:50:38Z

Got this error on H100 using 8bit Llama. If anyone can make it on H100?

You can avoid to use 8 bit. 4bit and 16bit are fine.

rmivdc closed this as completed Mar 27, 2023

rmivdc reopened this Mar 27, 2023

NanoCode012 mentioned this issue Jun 24, 2023

[Bug] Exception: cublasLt ran into an error! during fine-tuning LLM in 8bit mode bitsandbytes-foundation/bitsandbytes#538

Open

zaporter mentioned this issue May 23, 2024

Recent RunPod Axolotl error axolotl-ai-cloud/axolotl#1596

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuBLAS API failed with status 15 - Error #174

cuBLAS API failed with status 15 - Error #174

rmivdc commented Mar 26, 2023 •

edited

Loading

loganlebanoff commented Mar 27, 2023

rmivdc commented Mar 27, 2023 •

edited

Loading

loganlebanoff commented Mar 27, 2023

rmivdc commented Mar 27, 2023

loganlebanoff commented Mar 27, 2023

leehanchung commented Apr 1, 2023 •

edited

Loading

mudomau commented Apr 2, 2023

rmivdc commented Apr 2, 2023 •

edited

Loading

samuelcardoso commented Apr 12, 2023

arvindsun commented Apr 15, 2023 •

edited

Loading

SVEEu commented Apr 27, 2023

Malfaro43 commented May 12, 2023 •

edited

Loading

daniel-furman commented May 23, 2023

jonataslaw commented May 24, 2023

leehanchung commented May 24, 2023

jonataslaw commented May 24, 2023

zubair-ahmed-ai commented Jun 5, 2023

jonataslaw commented Jun 5, 2023 •

edited

Loading

Thytu commented Jun 8, 2023

daniel-furman commented Jun 10, 2023

daniel-furman commented Jun 14, 2023 •

edited

Loading

huawei-lin commented Jul 24, 2023 •

edited

Loading

zubair-ahmed-ai commented Aug 3, 2023 •

edited

Loading

Jacobsolawetz commented Sep 11, 2023

HaishuoFang commented Sep 20, 2023

jieWANGforwork commented Oct 21, 2023

huawei-lin commented Oct 21, 2023

cuBLAS API failed with status 15 - Error #174

cuBLAS API failed with status 15 - Error #174

Comments

rmivdc commented Mar 26, 2023 • edited Loading

loganlebanoff commented Mar 27, 2023

rmivdc commented Mar 27, 2023 • edited Loading

loganlebanoff commented Mar 27, 2023

rmivdc commented Mar 27, 2023

loganlebanoff commented Mar 27, 2023

leehanchung commented Apr 1, 2023 • edited Loading

mudomau commented Apr 2, 2023

rmivdc commented Apr 2, 2023 • edited Loading

samuelcardoso commented Apr 12, 2023

arvindsun commented Apr 15, 2023 • edited Loading

SVEEu commented Apr 27, 2023

Malfaro43 commented May 12, 2023 • edited Loading

daniel-furman commented May 23, 2023

jonataslaw commented May 24, 2023

leehanchung commented May 24, 2023

jonataslaw commented May 24, 2023

zubair-ahmed-ai commented Jun 5, 2023

jonataslaw commented Jun 5, 2023 • edited Loading

Thytu commented Jun 8, 2023

daniel-furman commented Jun 10, 2023

daniel-furman commented Jun 14, 2023 • edited Loading

huawei-lin commented Jul 24, 2023 • edited Loading

zubair-ahmed-ai commented Aug 3, 2023 • edited Loading

Jacobsolawetz commented Sep 11, 2023

HaishuoFang commented Sep 20, 2023

jieWANGforwork commented Oct 21, 2023

huawei-lin commented Oct 21, 2023

rmivdc commented Mar 26, 2023 •

edited

Loading

rmivdc commented Mar 27, 2023 •

edited

Loading

leehanchung commented Apr 1, 2023 •

edited

Loading

rmivdc commented Apr 2, 2023 •

edited

Loading

arvindsun commented Apr 15, 2023 •

edited

Loading

Malfaro43 commented May 12, 2023 •

edited

Loading

jonataslaw commented Jun 5, 2023 •

edited

Loading

daniel-furman commented Jun 14, 2023 •

edited

Loading

huawei-lin commented Jul 24, 2023 •

edited

Loading

zubair-ahmed-ai commented Aug 3, 2023 •

edited

Loading