Add Support for the new GGUF format which replaces GGML #3676

apcameron · 2023-08-24T18:35:27Z

Llama.cpp has dropped support for the GGML format and now only supports GGUF

berkut1 · 2023-08-24T20:12:59Z

abetlen/llama-cpp-python#628
We can only wait.

Ph0rk0z · 2023-08-25T18:39:41Z

You can probably use: abetlen/llama-cpp-python#633

if you merge it yourselves I would back up old llama-cpp-python. llama.cpp does not care about breaking changes.

sirus20x6 · 2023-08-25T22:25:47Z

it looks like this is now done

Yzord · 2023-08-25T22:42:08Z

Still can't download them. Hope it will be supported soon :)

sammcj · 2023-08-26T01:54:05Z

It looks like we're now waiting on https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels to be updated, I've added a request issue - jllllll/llama-cpp-python-cuBLAS-wheels#3

jllllll · 2023-08-26T02:24:23Z

Wheels have been uploaded:

https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.79+cu117-cp310-cp310-win_amd64.whl; platform_system == "Windows"
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.79+cu117-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

Keep in mind that this version of llama-cpp-python does not support GGML models. Only GGUF models.
The reason for this is that llama.cpp has dropped GGML support.

ctransformers has already been updated in the webui to support GGUF, if all you want is to try it out.

Personally, I would prefer to wait until more GGML models are converted to GGUF before updating llama-cpp-python.
This is a significant, breaking change at this point with so few GGUF models available and so many GGML models in use.
While there is a conversion script that people can use, I don't expect many people to be okay with that.

If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary.

Alternatively, ctransformers can be used for GGML support as it supports both formats. This isn't a great solution either, as ctransformers is noticeably slower than llama-cpp-python, for whatever reason.

sirus20x6 · 2023-08-26T03:49:53Z

Personally, I would prefer to wait until more GGML models are converted to GGUF before updating llama-cpp-python.

isn't there a conversion script? if so why wait?

jllllll · 2023-08-26T04:15:09Z

isn't there a conversion script? if so why wait?

The conversion script is not guaranteed to work and it's usage can be somewhat involved in order to perform a proper conversion.
My concern with this is that there are many people using the webui that do not have the technical ability to run the conversion script. Updating now will force them to use ctransformers to load their GGML models, which also means they will run slower.

As I mentioned, there aren't many GGUF models available right now and the ctransformers loader already supports them.
There is very little benefit to updating llama-cpp-python until more models are available.
Remember, the latest llama-cpp-python can not load GGML models at all. Only GGUF.

sirus20x6 · 2023-08-26T08:22:08Z

yeah, I just deleted all my GGML files because GGUF came out. Guess I'll stick with ctransformers for now, but I think people want the ease of not having to set any parameters that comes with GGUF

Dampfinchen · 2023-08-26T09:46:05Z

If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary.

Yes, I do think this would be the ideal approach. Otherwise, many people are going to complain why it suddenly stops working. If a GGML file is detected, it just uses the older commit while GGUF would use the new one.

sirus20x6 · 2023-08-26T10:02:43Z

what about just have the script convert them for people so no duplicate llama-cpp-python is needed? maybe a check and a "do you want to convert this to gguf?" ui element

jllllll · 2023-08-26T15:48:17Z

If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary.

Yes, I do think this would be the ideal approach. Otherwise, many people are going to complain why it suddenly stops working. If a GGML file is detected, it just uses the older commit while GGUF would use the new one.

I have written the code needed to support this here: jllllll@4a999e3
I will make a PR for it if ooba wants to use it: #3695

jllllll · 2023-08-26T15:50:03Z

what about just have the script convert them for people so no duplicate llama-cpp-python is needed? maybe a check and a "do you want to convert this to gguf?" ui element

The conversion script is not guaranteed to work with every model.
~~One such model was just discovered in the Discord server: MythoMax-L2-13B~~
The script is not intended to be the main method of creating GGUF models.
It is intended to be a backup for those who don't have the hardware to create the GGUF model from scratch.
The intended method of creating GGUF models is to convert HF models directly to GGUF, which requires loading the full HF model.
This just isn't feasible for most people.

Edit: The issue with converting MythoMax-L2-13B has been fixed.

Ph0rk0z · 2023-08-26T16:06:20Z

Hope it works for those 70B I d/l over the last week. I think this is the 3rd or 4th time they deprecate a format and it's always all or nothing.

jllllll · 2023-08-26T16:42:08Z

Fortunately, GGUF is designed to be expandable. So, this should be the last format deprecation.

Patronics · 2023-08-27T07:08:23Z

I was just starting to experiment with installing local LLMs (I've wanted to experiment with them for ages but been too busy), but seems I've picked a tumultuous time to start, so eagerly waiting for this migration to be complete so I can download and start playing with the models that won't be obsolete in a week!

Edit: It seems to now be supported! I successfully am running TheBloke's GGUF CodeLlama release!

FartyPants · 2023-08-27T14:02:44Z

Works, but currently GGUF speak like Yoda:
Greetings, I am here to provide. Is there you need help with?
fix:
abetlen/llama-cpp-python#644

Ph0rk0z · 2023-08-27T14:38:09Z

What is this I hear about GGUFv2 and header compatibility removed by October?

berkut1 · 2023-08-27T14:42:51Z

@Ph0rk0z yes, GGUFv2 was released, which is BC :) They decided it was better, because GGUFv1 is not so much yet.
So again, need to wait llama-cpp-python

Ph0rk0z · 2023-08-27T14:45:17Z

Right.. but they deprecate the format again? Why not deprecate the format now or keep it? Waiting till October so people upload already obsolete models?

jllllll · 2023-08-27T14:48:49Z

The GGUFv2 implementation is still compatible with v1. I don't think it will cause any issues.
v2 is pretty much just for 64-bit models.

berkut1 · 2023-08-27T14:49:00Z

@Ph0rk0z well. someone says, that backward compatibility seems to work fine ggerganov/llama.cpp#2821

Ph0rk0z · 2023-08-27T15:15:17Z

Right.. but I read this in the code


// NOTE: temporary handling of GGUFv1 >> remove after Oct 2023
static bool gguf_fread_str_cur(FILE * file, struct gguf_str * p, size_t * offset) {
    p->n    = 0;
    p->data = NULL;

    bool ok = true;

also
case GGUF_FILE_VERSION_V1: return "GGUF V1 (support until nov 2023)";

jllllll · 2023-08-27T16:49:29Z

Why on earth would they do that?
The whole point of GGUF was to eliminate the need for format deprecation:
https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#specification

If they keep doing that, people just won't bother with GGUF. They could get away with frequent format deprecation in the past because there wasn't all that many llama-based models back then. Now, there are too many models using GGMLv3 to want to have to deal with constantly redownloading and reconverting models.

berkut1 · 2023-08-27T17:09:48Z

@Ph0rk0z Yes, I noticed that too when I checked the code. My first thoughts were based because of their initial discussion.

@jllllll I hope that is the last BC, it looks like their just forgot about 64 bit.

Ph0rk0z · 2023-08-27T17:52:37Z

Yea.. so people will upload GGUFv1 models next 2 months for what? It has only been a couple of days. Why not convert them now when there are few. But instead, no mention of this and we have to go looking for it.

I have low bandwidth so can't just re download every 70b.

Had this situation before, when GPU offloading was first created. All the GGML I downloaded had to be requantized from scratch or re-downloaded. You couldn't even use a script to convert them.

And I can't not bother with GGUF because some good PRs got merged after.

mechanicmuthu · 2023-08-27T18:04:24Z

They are saying in this discussion that its a simple command line conversion between GGUF v1 to GGUF v2. The deprecation in this GGUF format is simply for simplicity rather than having a technical reason to stop supporting GGUF v1.

Ph0rk0z · 2023-08-27T22:18:02Z

Right! Very simple! Little chance of failure.

I don't need --allow-requantize or --leave-output-tensor, right?

It probably will work for the non-k-quants types but pretty sure k-quants won't work. (There were also some changes to the decisions k-quants makes for LLaMA2 70B models so in that particular case it wouldn't pass through all the tensors even if the other issues were dealt with.)

'spose I'll see how it goes for the current quants and turning them into GGUF.. but will they be GGUFv1 or v2 when I use the script now?

Dharmavineta · 2023-09-05T14:48:33Z

After days of troubleshooting and literally having brain freeze over the "can't load the model" error, I came across this thread now!!(RIP me) I burned about 30 gb downloading all kinds of models and none of them are gguf(talk about being unlucky). Will atleast the gguf version stay for a while now? I don't want to end up with unsupported versions and then again break my head over the same issue in a week time.

Also what are versions does langchain work with? Does the gptq version work or is it only gguf?

Ph0rk0z · 2023-09-05T18:42:09Z

Convert the GGML to GGUF if you got them recently it will work.

github-actions · 2023-10-18T23:16:35Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

apcameron added the enhancement New feature or request label Aug 24, 2023

rlancemartin mentioned this issue Aug 25, 2023

GGUF Support abetlen/llama-cpp-python#628

Closed

sammcj mentioned this issue Aug 26, 2023

GGUF Support jllllll/llama-cpp-python-cuBLAS-wheels#3

Closed

phronmophobic mentioned this issue Sep 24, 2023

Add support for gguf phronmophobic/llama.clj#8

Closed

github-actions bot added the stale label Oct 18, 2023

github-actions bot closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for the new GGUF format which replaces GGML #3676

Add Support for the new GGUF format which replaces GGML #3676

apcameron commented Aug 24, 2023

berkut1 commented Aug 24, 2023

Ph0rk0z commented Aug 25, 2023

sirus20x6 commented Aug 25, 2023

Yzord commented Aug 25, 2023

sammcj commented Aug 26, 2023

jllllll commented Aug 26, 2023 •

edited

Loading

sirus20x6 commented Aug 26, 2023

jllllll commented Aug 26, 2023 •

edited

Loading

sirus20x6 commented Aug 26, 2023

Dampfinchen commented Aug 26, 2023

sirus20x6 commented Aug 26, 2023

jllllll commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

Ph0rk0z commented Aug 26, 2023

jllllll commented Aug 26, 2023

Patronics commented Aug 27, 2023 •

edited

Loading

FartyPants commented Aug 27, 2023 •

edited

Loading

Ph0rk0z commented Aug 27, 2023

berkut1 commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023

jllllll commented Aug 27, 2023

berkut1 commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023 •

edited

Loading

jllllll commented Aug 27, 2023 •

edited

Loading

berkut1 commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023

mechanicmuthu commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023

Dharmavineta commented Sep 5, 2023 •

edited

Loading

Ph0rk0z commented Sep 5, 2023

github-actions bot commented Oct 18, 2023

Add Support for the new GGUF format which replaces GGML #3676

Add Support for the new GGUF format which replaces GGML #3676

Comments

apcameron commented Aug 24, 2023

berkut1 commented Aug 24, 2023

Ph0rk0z commented Aug 25, 2023

sirus20x6 commented Aug 25, 2023

Yzord commented Aug 25, 2023

sammcj commented Aug 26, 2023

jllllll commented Aug 26, 2023 • edited Loading

sirus20x6 commented Aug 26, 2023

jllllll commented Aug 26, 2023 • edited Loading

sirus20x6 commented Aug 26, 2023

Dampfinchen commented Aug 26, 2023

sirus20x6 commented Aug 26, 2023

jllllll commented Aug 26, 2023 • edited Loading

jllllll commented Aug 26, 2023 • edited Loading

Ph0rk0z commented Aug 26, 2023

jllllll commented Aug 26, 2023

Patronics commented Aug 27, 2023 • edited Loading

FartyPants commented Aug 27, 2023 • edited Loading

Ph0rk0z commented Aug 27, 2023

berkut1 commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023

jllllll commented Aug 27, 2023

berkut1 commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023 • edited Loading

jllllll commented Aug 27, 2023 • edited Loading

berkut1 commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023

mechanicmuthu commented Aug 27, 2023

Ph0rk0z commented Aug 27, 2023

Dharmavineta commented Sep 5, 2023 • edited Loading

Ph0rk0z commented Sep 5, 2023

github-actions bot commented Oct 18, 2023

jllllll commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

Patronics commented Aug 27, 2023 •

edited

Loading

FartyPants commented Aug 27, 2023 •

edited

Loading

Ph0rk0z commented Aug 27, 2023 •

edited

Loading

jllllll commented Aug 27, 2023 •

edited

Loading

Dharmavineta commented Sep 5, 2023 •

edited

Loading