-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125
Comments
ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama. |
Thanks! I just want to raise the issue, that it is not just an usual import/build thing, and probably requires a small code-change in ggml.go, which might not be on the radar otherwise. |
Import issue has been fixed. |
Import issue still here. "Error: invalid file magic" on all Q4_0_4_8 models ( |
Which model? |
What is the issue?
I built ollama on Ubuntu24.04, running in Windows11's WSL2 on my Surface 11 Pro to try and test Ollama with llama.cpp's Q4_0_4_8 acceleration.
Ollama+llama.cpp builds, imports my local llama-2 Q4_0, and runs it.
But when I try and import a local llama-2 Q4_0_4_8 model (which runs with llama.cpp), it gives an "Error: invalid file magic", apparently from its ggml.go module (at line 311 ?), which maybe does not seem to understand the new Q4_0_4_4 and Q4_0_4_8 formats.
llama.cpp recently introduced these formats to accelerate modern arm64 CPUs like the Snapdragon X, It also works on other newer ARM CPUs and brings an up to 2-3x speed improvement. Details see llama.cpp PR#5780, and there seems to be work done for x64.
P.S.:I tried this on Linux (Windows' WSL2), since building llama.cpp for Windows on ARM / Snapdragon X requires special build instructions (using clang instead of MSVC, details see llama.cpp build instructions), and I'm not sure if ollama already follows these.
@SebastianGode independently also had this issue.
OS
Linux / Ubuntu 24.04 on WSL2, Windows on ARM
GPU
None
CPU
arm64 / Snapdragon X Plus
Ollama version
0.3.2, 3e61426
The text was updated successfully, but these errors were encountered: