Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction (core dumped) error #3731

Closed
1 task done
plaidpants opened this issue Aug 28, 2023 · 11 comments
Closed
1 task done

Illegal instruction (core dumped) error #3731

plaidpants opened this issue Aug 28, 2023 · 11 comments
Labels
bug Something isn't working stale

Comments

@plaidpants
Copy link

Describe the bug

Upgraded to current version and now after loading any model I get an Illegal instruction (core dumped) error. I did a clean onclick install on linux and a reboot to see if that corrected the issue but it did not.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Use the one click installer for linux, select D for CPU/RAM only option. Load a model using llama.cpp on the model tab observe instruction (core dumped) error. Previously this was working with my GGML models, noticed that GGML models are now deprecated in llama.cpp so I tried a GGUF model instead but with the same result.

Screenshot

XXX:~/oobabooga_linux$ ./start_linux.sh

The following flags have been taken from the environment variable 'OOBABOOGA_FLAGS':

--listen

To use the CMD_FLAGS Inside webui.py, unset 'OOBABOOGA_FLAGS'.

/home/XXX/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

warn("The installed version of bitsandbytes was compiled without GPU support. "

/home/XXX/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32

2023-08-28 10:54:17 INFO:Loading the extension "gallery"...

Running on local URL: http://0.0.0.0:7860/

To create a public link, set share=True in launch().

2023-08-28 10:54:33 INFO:Loading CodeLlama-34B-GGUF...

Illegal instruction (core dumped)

Logs

See above.

System Info

no GPU, CPU and lots of RAM only.
@plaidpants plaidpants added the bug Something isn't working label Aug 28, 2023
@netrunnereve
Copy link
Contributor

netrunnereve commented Aug 29, 2023

Well I got this today after getting all excited about GGUF 🤢

The crash occurs since the GGML backwards compatibility PR #3697 defaults to an AVX2 llama_cpp_ggml package in requirements.txt. When we load llama_cpp_ggml the program exits since our older CPUs don't understand these new instructions.

The fix is to edit requirements.txt to use the non-AVX2 basic packages (find and replace "avx2" with "basic"). The releases have an AVX option as well but that strangely includes FMA so Sandy Bridge/Ivy Bridge CPUs can't run it. Haswell supports both FMA and AVX2 so I think there's a mistake with the AVX release (pinging @jllllll).

IMO the best way to deal with something like this is to have the lib compiled on install (with the desired envvars) like how it's done with llama-cpp-python. I think Windows has issues that force it to use prebuilt wheels but it works fine for us penguins.

@jllllll
Copy link
Contributor

jllllll commented Aug 29, 2023

Windows uses pre-compiled wheels because installing a compiler is a pain and often beyond what non-technical users are willing to do. Despite Linux having a much easier installation process for compilers, we still get plenty of reports from Linux users who encounter errors on install because the main llama-cpp-python package is built from source on Linux and they don't have a compiler installed and don't know how to install one. We get enough that I've often considered making a PR to install pre-compiled wheels for all systems.

As for the GGML package, the main reason that the AVX wheels were built with FMA support is because that is how ctransformers builds their AVX libs and I assumed that would be fine. Regardless, I'll rebuild the AVX wheels without FMA when I have the time.

@netrunnereve
Copy link
Contributor

So I ended up doing a quick thought experiment on the most optimal prebuilds...

  1. SSE only for Core 2 and newer processors.
  2. AVX for Sandy Bridge and newer. F16C should be turned off as that only works on Ivy Bridge.
  3. AVX2 and FMA for Haswell and newer.
  4. AVX512 for the modern Xeons that support it.

For 2, 3, and 4 you would also need:

  1. No BLAS
  2. CLBlast
  3. CUBlas
  4. ROCm

Aaand that's 13 builds already for a single OS. If we add GGML support on top we double that. Now many developers just compile with SSE and call it a day, but llama.cpp is extremely performance dependent and we get huge improvements with SIMD intrinsics.

@jllllll
Copy link
Contributor

jllllll commented Aug 29, 2023

For cuBLAS builds, I've seen multiple benchmarks comparing the various configurations. The performance between them, especially between AVX2 and AVX512, is not significant enough to warrant the excessive build times for so many wheels. Those benchmarks are the primary reason that the webui has pre-built cuBLAS wheels as I made the PR for using them because I was confident that it wasn't costing people huge amounts of performance, if any.

For CPU builds, I don't remember if the benchmarks I saw compared them or not.

The configurations I am currently building are largely for compatibility. There are just too many possible configurations, making building optimal wheels for everyone unfeasible. Most people using the webui care more about ease of use than maximizing performance. It is better if people who want the absolute best performance build llama-cpp-python themselves according to what their specific system supports.

Keep in mind that I am not just making llama-cpp-python wheels for the webui. I am making wheels for anyone who wants them. My current workflow builds 404 wheels for a single release. It takes around 6 hours to build and upload all of them every time a new version of llama-cpp-python is released. Due to API limits, many of those wheels need to be manually uploaded. Although, I am working on a script to automate that.

I will disable F16C and FMA for AVX builds, though I can't disable F16C for Windows as it is implied by AVX in the compiler.

While I can build CLBlast wheels for Linux, there is an unresolved issue in how llama-cpp-python loads libs on Windows that makes pre-built wheels for it non-functional without additional manual setup. The fix is easy, but it hasn't been done yet. I would rather wait for that to be fixed before building CLBlast wheels. There is another issue with CLBlast in which it will only work well on the hardware that it was tuned for. Everything else will get performance worse than without it. Tuning requires multiple hours of building the libs yourself.

@jllllll
Copy link
Contributor

jllllll commented Aug 29, 2023

The AVX wheels have been re-built and uploaded.

@wagesj45
Copy link

This also seems to be an issue running in Linux as well. I don't see the OS specified in the original post, but see Windows mentioned multiple times in reference to the pre-built wheels. Just wanted it noted this is (continues to be) a problem on Linux as well.

@jllllll
Copy link
Contributor

jllllll commented Aug 30, 2023

Default configuration for llama-cpp-python is to use AVX2, FMA and F16C instructions. If your CPU does not support any one of those, then you need to rebuild llama-cpp-python accordingly.

For pre-built wheels, you can use this command:

python -m pip install llama-cpp-python llama-cpp-python-cuda llama-cpp-python-ggml llama-cpp-python-ggml-cuda --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cpu --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/textgen/AVX/cu117

Change both instances of AVX in the URLs near the end of the command to basic if your CPU does not support AVX instructions.

@netrunnereve
Copy link
Contributor

There is another issue with CLBlast in which it will only work well on the hardware that it was tuned for. Everything else will get performance worse than without it. Tuning requires multiple hours of building the libs yourself.

Yeah the number of tuned cards is... not great. A lot more people are submitting tunes now, possibly thanks to llama.cpp?

FYI CLBlast tuning is solely for the prompt processing stage as that's where the matrix multiplication lib is being used. Custom OpenCL kernels are used for inference when we offload layers to the GPU and those don't require tuning. Even if prompt processing is slower due to lack of tuning you get the advantage of faster inference and the use of graphics memory to store your large model.

@tangjinchuan
Copy link

Yes, we need more tuning results to CLBLAST. So any new card is very welcome. The tuning should be less than one hour or even half an hour (take 4060Ti 16G as an example). For Windows user, there is no need to build anything, just visit CNugteren/CLBlast#1 (comment) for more running information. For Linux user, the official tuning page may be alien to new users, it would take some time to settle down every building essentials.
I did put a discussion on llamma.cpp to ask users to submit new results to CLBLAST ggerganov/llama.cpp#1688. In the meantime, I do have a project that I have plan to bring CLBLAST to Octave which is another reason https://sourceforge.net/projects/octave-ocl-extra/.

@MarbleMunkey
Copy link

Default configuration for llama-cpp-python is to use AVX2, FMA and F16C instructions. If your CPU does not support any one of those, then you need to rebuild llama-cpp-python accordingly.

For pre-built wheels, you can use this command:

python -m pip install llama-cpp-python llama-cpp-python-cuda llama-cpp-python-ggml llama-cpp-python-ggml-cuda --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cpu --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/textgen/AVX/cu117

Change both instances of AVX in the URLs near the end of the command to basic if your CPU does not support AVX instructions.

Thank you very much I'm running a P40 strapped to a potato (A 10th gen Celeron), so the "basic" variation is what did the trick.

@github-actions github-actions bot added the stale label Nov 10, 2023
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

6 participants