-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Error 222 - provided PTX was compiled with an unsupported toolchain #401
Comments
This looks like a |
Closing. Please reopen if the problem is reproducible with the latest |
The issue still repros on For anyone else blocked by this, a hacky workaround is to manually compile llama.cpp as a library then copy the resulting file into llama-cpp-python: cd ~
git clone --recursive https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make LLAMA_CUBLAS=1 -j libllama.so
# HACK: Use custom compiled libllama.so
cp ~/llama.cpp/libllama.so /opt/conda/lib/python3.10/site-packages/llama_cpp/libllama.so |
I'm having this issue and the @randombk workaround doesn't work for me, it just gives a new error: The output of
Any suggestions would be greatly appreciated! |
@randombk Thank you so much! |
FYI, I was running into this same issue but once I installed the actual CUDA version matching the version indicated at the top right in |
FYI: ➜ ~ nvidia-smi
Fri Aug 11 12:53:05 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:21:00.0 Off | N/A |
| 0% 46C P8 17W / 300W | 8MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:49:00.0 On | N/A |
| 0% 53C P8 28W / 350W | 1635MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 6244 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 6244 G /usr/lib/xorg/Xorg 380MiB |
| 1 N/A N/A 6497 G /usr/bin/gnome-shell 88MiB |
| 1 N/A N/A 10763 G ...6044373,14595055559140153217,262144 199MiB |
| 1 N/A N/A 1302513 C python 884MiB |
| 1 N/A N/A 2873402 G ...sion,SpareRendererForSitePerProcess 61MiB |
+---------------------------------------------------------------------------------------+
➜ ~ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0 |
It wasn't just nvcc though. I used
|
I personally don't use conda any more... |
Building libllama.so works for me. |
this works for me in kaggle: !git clone --recursive https://github.com/ggerganov/llama.cpp.git
# instead of: !LLAMA_CUBLAS=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
import os
os.chdir('llama.cpp')
!make LLAMA_CUBLAS=1 -j libllama.so
# # HACK: Use custom compiled libllama.so
!cp libllama.so /opt/conda/lib/python3.10/site-packages/llama_cpp/libllama.so |
same issue here |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Prerequisites
Expected Behavior
When using a Kaggle notebook with 2xT4 GPU,
llama-cpp-python
should work as expected.Current Behavior
Running llama.cpp directly works as expected.
Environment and Context
Free Kaggle notebook running with the 'T4 x2' GPU accelerator.
Failure Information (for bugs)
Steps to Reproduce
I published a repro at https://www.kaggle.com/randombk/bug-llama-cpp-python-cuda-222-repro
The text was updated successfully, but these errors were encountered: