minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125

AndreasKunar · 2024-08-01T18:45:11Z

What is the issue?

I built ollama on Ubuntu24.04, running in Windows11's WSL2 on my Surface 11 Pro to try and test Ollama with llama.cpp's Q4_0_4_8 acceleration.

Ollama+llama.cpp builds, imports my local llama-2 Q4_0, and runs it.

But when I try and import a local llama-2 Q4_0_4_8 model (which runs with llama.cpp), it gives an "Error: invalid file magic", apparently from its ggml.go module (at line 311 ?), which maybe does not seem to understand the new Q4_0_4_4 and Q4_0_4_8 formats.

llama.cpp recently introduced these formats to accelerate modern arm64 CPUs like the Snapdragon X, It also works on other newer ARM CPUs and brings an up to 2-3x speed improvement. Details see llama.cpp PR#5780, and there seems to be work done for x64.

P.S.:I tried this on Linux (Windows' WSL2), since building llama.cpp for Windows on ARM / Snapdragon X requires special build instructions (using clang instead of MSVC, details see llama.cpp build instructions), and I'm not sure if ollama already follows these.

@SebastianGode independently also had this issue.

OS

Linux / Ubuntu 24.04 on WSL2, Windows on ARM

GPU

None

CPU

arm64 / Snapdragon X Plus

Ollama version

0.3.2, 3e61426

The text was updated successfully, but these errors were encountered:

rick-github · 2024-08-01T19:01:36Z

ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.

AndreasKunar · 2024-08-01T19:16:22Z

ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.

Thanks! I just want to raise the issue, that it is not just an usual import/build thing, and probably requires a small code-change in ggml.go, which might not be on the radar otherwise.

AndreasKunar · 2024-08-11T11:02:02Z

Import issue has been fixed.

antonovkz · 2024-10-19T17:52:01Z

Import issue still here. "Error: invalid file magic" on all Q4_0_4_8 models (

rick-github · 2024-10-19T19:16:43Z

Which model?

antonovkz · 2024-10-20T10:08:47Z

literally every model that I try to add. for example:

They also work well in LM Studio

UPD: Windows ARM64

AndreasKunar added the bug Something isn't working label Aug 1, 2024

joshyan1 mentioned this issue Aug 1, 2024

llm: add Q4_0_4_4, Q4_0_4_8, Q4_0_8_8 quants #6126

Closed

AndreasKunar closed this as completed Aug 11, 2024

rick-github mentioned this issue Oct 22, 2024

Add tensors for bitnet/triLMs, Q4_0_x_x #7318

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125

minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125

AndreasKunar commented Aug 1, 2024 •

edited

Loading

rick-github commented Aug 1, 2024

AndreasKunar commented Aug 1, 2024

AndreasKunar commented Aug 11, 2024

antonovkz commented Oct 19, 2024

rick-github commented Oct 19, 2024

antonovkz commented Oct 20, 2024 •

edited

Loading

minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125

minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #6125

Comments

AndreasKunar commented Aug 1, 2024 • edited Loading

What is the issue?

OS

GPU

CPU

Ollama version

rick-github commented Aug 1, 2024

AndreasKunar commented Aug 1, 2024

AndreasKunar commented Aug 11, 2024

antonovkz commented Oct 19, 2024

rick-github commented Oct 19, 2024

antonovkz commented Oct 20, 2024 • edited Loading

AndreasKunar commented Aug 1, 2024 •

edited

Loading

antonovkz commented Oct 20, 2024 •

edited

Loading