-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illegal instruction (core dumped) error #3731
Comments
Well I got this today after getting all excited about GGUF 🤢 The crash occurs since the GGML backwards compatibility PR #3697 defaults to an AVX2 The fix is to edit IMO the best way to deal with something like this is to have the lib compiled on install (with the desired envvars) like how it's done with |
Windows uses pre-compiled wheels because installing a compiler is a pain and often beyond what non-technical users are willing to do. Despite Linux having a much easier installation process for compilers, we still get plenty of reports from Linux users who encounter errors on install because the main llama-cpp-python package is built from source on Linux and they don't have a compiler installed and don't know how to install one. We get enough that I've often considered making a PR to install pre-compiled wheels for all systems. As for the GGML package, the main reason that the AVX wheels were built with FMA support is because that is how |
So I ended up doing a quick thought experiment on the most optimal prebuilds...
For 2, 3, and 4 you would also need:
Aaand that's 13 builds already for a single OS. If we add GGML support on top we double that. Now many developers just compile with SSE and call it a day, but llama.cpp is extremely performance dependent and we get huge improvements with SIMD intrinsics. |
For cuBLAS builds, I've seen multiple benchmarks comparing the various configurations. The performance between them, especially between For CPU builds, I don't remember if the benchmarks I saw compared them or not. The configurations I am currently building are largely for compatibility. There are just too many possible configurations, making building optimal wheels for everyone unfeasible. Most people using the webui care more about ease of use than maximizing performance. It is better if people who want the absolute best performance build llama-cpp-python themselves according to what their specific system supports. Keep in mind that I am not just making llama-cpp-python wheels for the webui. I am making wheels for anyone who wants them. My current workflow builds 404 wheels for a single release. It takes around 6 hours to build and upload all of them every time a new version of llama-cpp-python is released. Due to API limits, many of those wheels need to be manually uploaded. Although, I am working on a script to automate that. I will disable While I can build CLBlast wheels for Linux, there is an unresolved issue in how llama-cpp-python loads libs on Windows that makes pre-built wheels for it non-functional without additional manual setup. The fix is easy, but it hasn't been done yet. I would rather wait for that to be fixed before building CLBlast wheels. There is another issue with CLBlast in which it will only work well on the hardware that it was tuned for. Everything else will get performance worse than without it. Tuning requires multiple hours of building the libs yourself. |
The |
This also seems to be an issue running in Linux as well. I don't see the OS specified in the original post, but see Windows mentioned multiple times in reference to the pre-built wheels. Just wanted it noted this is (continues to be) a problem on Linux as well. |
Default configuration for llama-cpp-python is to use For pre-built wheels, you can use this command:
Change both instances of |
Yeah the number of tuned cards is... not great. A lot more people are submitting tunes now, possibly thanks to llama.cpp? FYI CLBlast tuning is solely for the prompt processing stage as that's where the matrix multiplication lib is being used. Custom OpenCL kernels are used for inference when we offload layers to the GPU and those don't require tuning. Even if prompt processing is slower due to lack of tuning you get the advantage of faster inference and the use of graphics memory to store your large model. |
Yes, we need more tuning results to CLBLAST. So any new card is very welcome. The tuning should be less than one hour or even half an hour (take 4060Ti 16G as an example). For Windows user, there is no need to build anything, just visit CNugteren/CLBlast#1 (comment) for more running information. For Linux user, the official tuning page may be alien to new users, it would take some time to settle down every building essentials. |
Thank you very much I'm running a P40 strapped to a potato (A 10th gen Celeron), so the "basic" variation is what did the trick. |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
Describe the bug
Upgraded to current version and now after loading any model I get an Illegal instruction (core dumped) error. I did a clean onclick install on linux and a reboot to see if that corrected the issue but it did not.
Is there an existing issue for this?
Reproduction
Use the one click installer for linux, select D for CPU/RAM only option. Load a model using llama.cpp on the model tab observe instruction (core dumped) error. Previously this was working with my GGML models, noticed that GGML models are now deprecated in llama.cpp so I tried a GGUF model instead but with the same result.
Screenshot
XXX:~/oobabooga_linux$ ./start_linux.sh
The following flags have been taken from the environment variable 'OOBABOOGA_FLAGS':
--listen
To use the CMD_FLAGS Inside webui.py, unset 'OOBABOOGA_FLAGS'.
/home/XXX/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/XXX/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
2023-08-28 10:54:17 INFO:Loading the extension "gallery"...
Running on local URL: http://0.0.0.0:7860/
To create a public link, set
share=True
inlaunch()
.2023-08-28 10:54:33 INFO:Loading CodeLlama-34B-GGUF...
Illegal instruction (core dumped)
Logs
System Info
The text was updated successfully, but these errors were encountered: