Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125

Closed
AndreasKunar opened this issue Aug 1, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@AndreasKunar
Copy link

AndreasKunar commented Aug 1, 2024

What is the issue?

I built ollama on Ubuntu24.04, running in Windows11's WSL2 on my Surface 11 Pro to try and test Ollama with llama.cpp's Q4_0_4_8 acceleration.

Ollama+llama.cpp builds, imports my local llama-2 Q4_0, and runs it.

But when I try and import a local llama-2 Q4_0_4_8 model (which runs with llama.cpp), it gives an "Error: invalid file magic", apparently from its ggml.go module (at line 311 ?), which maybe does not seem to understand the new Q4_0_4_4 and Q4_0_4_8 formats.

llama.cpp recently introduced these formats to accelerate modern arm64 CPUs like the Snapdragon X, It also works on other newer ARM CPUs and brings an up to 2-3x speed improvement. Details see llama.cpp PR#5780, and there seems to be work done for x64.

P.S.:I tried this on Linux (Windows' WSL2), since building llama.cpp for Windows on ARM / Snapdragon X requires special build instructions (using clang instead of MSVC, details see llama.cpp build instructions), and I'm not sure if ollama already follows these.

@SebastianGode independently also had this issue.

OS

Linux / Ubuntu 24.04 on WSL2, Windows on ARM

GPU

None

CPU

arm64 / Snapdragon X Plus

Ollama version

0.3.2, 3e61426

@AndreasKunar AndreasKunar added the bug Something isn't working label Aug 1, 2024
@rick-github
Copy link
Collaborator

ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.

@AndreasKunar
Copy link
Author

ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.

Thanks! I just want to raise the issue, that it is not just an usual import/build thing, and probably requires a small code-change in ggml.go, which might not be on the radar otherwise.

@AndreasKunar
Copy link
Author

Import issue has been fixed.

@antonovkz
Copy link

Import issue still here. "Error: invalid file magic" on all Q4_0_4_8 models (

@rick-github
Copy link
Collaborator

Which model?

@antonovkz
Copy link

antonovkz commented Oct 20, 2024

literally every model that I try to add. for example:
Screenshot 2024-10-20 130707
They also work well in LM Studio

UPD: Windows ARM64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants