Illegal instruction (core dumped) error #3731

plaidpants · 2023-08-28T18:20:32Z

Describe the bug

Upgraded to current version and now after loading any model I get an Illegal instruction (core dumped) error. I did a clean onclick install on linux and a reboot to see if that corrected the issue but it did not.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Use the one click installer for linux, select D for CPU/RAM only option. Load a model using llama.cpp on the model tab observe instruction (core dumped) error. Previously this was working with my GGML models, noticed that GGML models are now deprecated in llama.cpp so I tried a GGUF model instead but with the same result.

Screenshot

XXX:~/oobabooga_linux$ ./start_linux.sh

The following flags have been taken from the environment variable 'OOBABOOGA_FLAGS':

--listen

To use the CMD_FLAGS Inside webui.py, unset 'OOBABOOGA_FLAGS'.

/home/XXX/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

warn("The installed version of bitsandbytes was compiled without GPU support. "

/home/XXX/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32

2023-08-28 10:54:17 INFO:Loading the extension "gallery"...

Running on local URL: http://0.0.0.0:7860/

To create a public link, set share=True in launch().

2023-08-28 10:54:33 INFO:Loading CodeLlama-34B-GGUF...

Illegal instruction (core dumped)

Logs

See above.

System Info

no GPU, CPU and lots of RAM only.

The text was updated successfully, but these errors were encountered:

netrunnereve · 2023-08-29T01:44:55Z

Well I got this today after getting all excited about GGUF 🤢

The crash occurs since the GGML backwards compatibility PR #3697 defaults to an AVX2 llama_cpp_ggml package in requirements.txt. When we load llama_cpp_ggml the program exits since our older CPUs don't understand these new instructions.

The fix is to edit requirements.txt to use the non-AVX2 basic packages (find and replace "avx2" with "basic"). The releases have an AVX option as well but that strangely includes FMA so Sandy Bridge/Ivy Bridge CPUs can't run it. Haswell supports both FMA and AVX2 so I think there's a mistake with the AVX release (pinging @jllllll).

IMO the best way to deal with something like this is to have the lib compiled on install (with the desired envvars) like how it's done with llama-cpp-python. I think Windows has issues that force it to use prebuilt wheels but it works fine for us penguins.

jllllll · 2023-08-29T02:42:47Z

Windows uses pre-compiled wheels because installing a compiler is a pain and often beyond what non-technical users are willing to do. Despite Linux having a much easier installation process for compilers, we still get plenty of reports from Linux users who encounter errors on install because the main llama-cpp-python package is built from source on Linux and they don't have a compiler installed and don't know how to install one. We get enough that I've often considered making a PR to install pre-compiled wheels for all systems.

As for the GGML package, the main reason that the AVX wheels were built with FMA support is because that is how ctransformers builds their AVX libs and I assumed that would be fine. Regardless, I'll rebuild the AVX wheels without FMA when I have the time.

netrunnereve · 2023-08-29T03:45:51Z

So I ended up doing a quick thought experiment on the most optimal prebuilds...

SSE only for Core 2 and newer processors.
AVX for Sandy Bridge and newer. F16C should be turned off as that only works on Ivy Bridge.
AVX2 and FMA for Haswell and newer.
AVX512 for the modern Xeons that support it.

For 2, 3, and 4 you would also need:

No BLAS
CLBlast
CUBlas
ROCm

Aaand that's 13 builds already for a single OS. If we add GGML support on top we double that. Now many developers just compile with SSE and call it a day, but llama.cpp is extremely performance dependent and we get huge improvements with SIMD intrinsics.

jllllll · 2023-08-29T05:11:28Z

For cuBLAS builds, I've seen multiple benchmarks comparing the various configurations. The performance between them, especially between AVX2 and AVX512, is not significant enough to warrant the excessive build times for so many wheels. Those benchmarks are the primary reason that the webui has pre-built cuBLAS wheels as I made the PR for using them because I was confident that it wasn't costing people huge amounts of performance, if any.

For CPU builds, I don't remember if the benchmarks I saw compared them or not.

The configurations I am currently building are largely for compatibility. There are just too many possible configurations, making building optimal wheels for everyone unfeasible. Most people using the webui care more about ease of use than maximizing performance. It is better if people who want the absolute best performance build llama-cpp-python themselves according to what their specific system supports.

Keep in mind that I am not just making llama-cpp-python wheels for the webui. I am making wheels for anyone who wants them. My current workflow builds 404 wheels for a single release. It takes around 6 hours to build and upload all of them every time a new version of llama-cpp-python is released. Due to API limits, many of those wheels need to be manually uploaded. Although, I am working on a script to automate that.

I will disable F16C and FMA for AVX builds, though I can't disable F16C for Windows as it is implied by AVX in the compiler.

While I can build CLBlast wheels for Linux, there is an unresolved issue in how llama-cpp-python loads libs on Windows that makes pre-built wheels for it non-functional without additional manual setup. The fix is easy, but it hasn't been done yet. I would rather wait for that to be fixed before building CLBlast wheels. There is another issue with CLBlast in which it will only work well on the hardware that it was tuned for. Everything else will get performance worse than without it. Tuning requires multiple hours of building the libs yourself.

jllllll · 2023-08-29T09:48:27Z

The AVX wheels have been re-built and uploaded.

wagesj45 · 2023-08-30T04:44:13Z

This also seems to be an issue running in Linux as well. I don't see the OS specified in the original post, but see Windows mentioned multiple times in reference to the pre-built wheels. Just wanted it noted this is (continues to be) a problem on Linux as well.

jllllll · 2023-08-30T06:56:08Z

Default configuration for llama-cpp-python is to use AVX2, FMA and F16C instructions. If your CPU does not support any one of those, then you need to rebuild llama-cpp-python accordingly.

For pre-built wheels, you can use this command:

python -m pip install llama-cpp-python llama-cpp-python-cuda llama-cpp-python-ggml llama-cpp-python-ggml-cuda --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cpu --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/textgen/AVX/cu117

Change both instances of AVX in the URLs near the end of the command to basic if your CPU does not support AVX instructions.

netrunnereve · 2023-08-30T18:36:58Z

There is another issue with CLBlast in which it will only work well on the hardware that it was tuned for. Everything else will get performance worse than without it. Tuning requires multiple hours of building the libs yourself.

Yeah the number of tuned cards is... not great. A lot more people are submitting tunes now, possibly thanks to llama.cpp?

FYI CLBlast tuning is solely for the prompt processing stage as that's where the matrix multiplication lib is being used. Custom OpenCL kernels are used for inference when we offload layers to the GPU and those don't require tuning. Even if prompt processing is slower due to lack of tuning you get the advantage of faster inference and the use of graphics memory to store your large model.

tangjinchuan · 2023-09-04T11:09:08Z

Yes, we need more tuning results to CLBLAST. So any new card is very welcome. The tuning should be less than one hour or even half an hour (take 4060Ti 16G as an example). For Windows user, there is no need to build anything, just visit CNugteren/CLBlast#1 (comment) for more running information. For Linux user, the official tuning page may be alien to new users, it would take some time to settle down every building essentials.
I did put a discussion on llamma.cpp to ask users to submit new results to CLBLAST ggerganov/llama.cpp#1688. In the meantime, I do have a project that I have plan to bring CLBLAST to Octave which is another reason https://sourceforge.net/projects/octave-ocl-extra/.

MarbleMunkey · 2023-09-27T21:03:55Z

Default configuration for llama-cpp-python is to use AVX2, FMA and F16C instructions. If your CPU does not support any one of those, then you need to rebuild llama-cpp-python accordingly.

For pre-built wheels, you can use this command:
python -m pip install llama-cpp-python llama-cpp-python-cuda llama-cpp-python-ggml llama-cpp-python-ggml-cuda --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cpu --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/textgen/AVX/cu117
Change both instances of AVX in the URLs near the end of the command to basic if your CPU does not support AVX instructions.

Thank you very much I'm running a P40 strapped to a potato (A 10th gen Celeron), so the "basic" variation is what did the trick.

github-actions · 2023-11-10T23:16:13Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

plaidpants added the bug Something isn't working label Aug 28, 2023

netrunnereve mentioned this issue Sep 2, 2023

Improve instructions for CPUs without AVX2 #3786

Merged

1 task

github-actions bot added the stale label Nov 10, 2023

github-actions bot closed this as completed Nov 10, 2023

hyperbolic-c mentioned this issue Mar 15, 2024

New IQ1_S somehow much worse than previous version ggerganov/llama.cpp#5996

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illegal instruction (core dumped) error #3731

Illegal instruction (core dumped) error #3731

plaidpants commented Aug 28, 2023

netrunnereve commented Aug 29, 2023 •

edited

Loading

jllllll commented Aug 29, 2023

netrunnereve commented Aug 29, 2023

jllllll commented Aug 29, 2023

jllllll commented Aug 29, 2023

wagesj45 commented Aug 30, 2023

jllllll commented Aug 30, 2023 •

edited

Loading

netrunnereve commented Aug 30, 2023

tangjinchuan commented Sep 4, 2023

MarbleMunkey commented Sep 27, 2023

github-actions bot commented Nov 10, 2023

Illegal instruction (core dumped) error #3731

Illegal instruction (core dumped) error #3731

Comments

plaidpants commented Aug 28, 2023

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

netrunnereve commented Aug 29, 2023 • edited Loading

jllllll commented Aug 29, 2023

netrunnereve commented Aug 29, 2023

jllllll commented Aug 29, 2023

jllllll commented Aug 29, 2023

wagesj45 commented Aug 30, 2023

jllllll commented Aug 30, 2023 • edited Loading

netrunnereve commented Aug 30, 2023

tangjinchuan commented Sep 4, 2023

MarbleMunkey commented Sep 27, 2023

github-actions bot commented Nov 10, 2023

netrunnereve commented Aug 29, 2023 •

edited

Loading

jllllll commented Aug 30, 2023 •

edited

Loading