GGUF #3695

oobabooga · 2023-08-26T08:12:58Z

~~Updates llama-cpp-python and deprecates GGML in favor of the new GGUF format.~~

~~The conversion from old ggml to gguf through convert-llama-ggmlv3-to-gguf.py is not automatic (some command-line flags have to be set manually), so this will be quite messy.~~

Adds GGUF support while keeping GGML support thanks to @jllllll.

jllllll · 2023-08-26T16:01:48Z

The conversion script is, unfortunately, not guaranteed to work with every model.
~~One such model was just discovered in the Discord server: MythoMax-L2-13B~~
The issue with models like that one has been fixed.

I have made renamed packages of llama-cpp-python 0.1.78 and written code to load GGML models through them: jllllll@4a999e3
This solution is pretty messy, so I wouldn't blame you if you didn't want to use it.
If you do, I can make a PR for it.

ctransformers can still load GGML models, but it is slower than llama-cpp-python and has other issues like not being able to unload models. This can serve as an alternative solution, but is far from ideal.

Not sure what TheBloke's plans are as far as converting previous models to GGUF. His latest uploads have been in both GGML and GGUF, so it seems that he still intends to support GGML.

oobabooga · 2023-08-26T17:33:59Z

That's a valid solution. The situation is not good at the moment honestly:

There are no GGUF versions of base llama available on Hugging Face: https://huggingface.co/models?search=llama%20gguf.
The llama.cpp README and documentation do not contain any instructions on how to convert existing GGMLs to GGUF.
The --help of convert-llama-ggmlv3-to-gguf.py claims that --eps should be set to 1e-5 for LLaMA2, when it has been demonstrated that 5e-6 leads to lower perplexities.
It is not clear to me how the --eps hardcoded into a GGUF file gets used and if it can still be changed manually.
The same for --context-length.
No idea what --model-metadata-dir, --vocab-dir, and --vocabtype do.

Ideally, I would like to simply convert the 20 GGML models that I have to GGUF and move on, but that may not be possible.

jllllll · 2023-08-26T18:30:58Z

It seems that n_gqa and rms_norm_eps were removed from llama-cpp-python. So, it is probably safe to assume that they can't be changed and the hardcoded values must be used. I have added some code to account for this in the PR I linked below.

--model-metadata-dir is intended to point to the original HF model dir.
GGUF supports additional features like special tokens. This info needs to be retrieved from the HF model for a full conversion since GGML doesn't have it. That option can be omitted, but a full conversion would be better.
I haven't tried it, but it may be possible to download the HF model with --text-only and not need the .bin files.

--vocab-dir and --vocabtype are used in combination with --model-metadata-dir.
--vocab-dir points to the directory containing tokenizer.model and is used if the tokenizer isn't in the HF model dir.
--vocabtype is set to the type of the tokenizer being loaded (or being created, not sure). It can be either spm or bpe, with spm being the default. Not sure what the difference between the 2 are or what models they are used with. Presumably, spm is what LLaMa uses.

I have made a PR with the commit linked before to be merged into the gguf2 branch here: #3697

berkut1 · 2023-08-27T03:17:47Z

Yes, when llama.cpp merged GGUF they removed n_gqa and rms_norm_eps, and maybe some information we manually add to config.yaml can now be accessed from GGUF metadata (?)

There are more gguf https://huggingface.co/models?pipeline_tag=text-generation&sort=trending&search=gguf now.

Use separate llama-cpp-python packages for GGML support

oobabooga · 2023-08-27T05:33:08Z

So, it is probably safe to assume that they can't be changed and the hardcoded values must be used.

Yes, I have tested this and confirmed that the hardcoded values are used no matter what value you supply.

I think that your idea of moving llama-cpp-python==0.1.78 into a different namespace and using it for GGML models is a great one that will prevent a chaotic situation. Eventually I want to move to GGUF exclusively, but I speculate that will only be reasonable in 2-3 months.

@berkut1 thank you. The next step will be to parse the GGUF metadata and use that information in the UI (I haven't found how to read a GGUF in Python yet).

berkut1 · 2023-08-27T07:59:07Z

@oobabooga looks like llama-cpp-python just added GGUF support.
There are places where deprecated lines still exist, for example: https://github.com/abetlen/llama-cpp-python/blob/9ab49bc1d47a3da011107d0277184ab1948f1364/llama_cpp/llama.py#L229

Maybe as temporary, we can just read console outputs with starting rows llm_load_print_meta .
I searched in llama-cpp-python but didn't find how to get it via API.

For example llama.cpp has some methods https://github.com/ggerganov/llama.cpp/blob/789c8c945a2814e1487e18e68823d9926e3b1454/ggml.h#L1856

oobabooga and others added 3 commits August 26, 2023 01:06

Replace ggml occurences with gguf

83640d6

Update requirements.txt

6e6431e

Use separate llama-cpp-python packages for GGML support

4a999e3

jllllll mentioned this pull request Aug 26, 2023

Add Support for the new GGUF format which replaces GGML #3676

Closed

Account for deprecated GGML parameters

4d61a7d

jllllll mentioned this pull request Aug 26, 2023

Use separate llama-cpp-python packages for GGML support #3697

Merged

1 task

oobabooga added 4 commits August 27, 2023 01:51

Merge pull request #3697 from jllllll/llamacpp-ggml

d826bc5

Use separate llama-cpp-python packages for GGML support

Minor fixes/cosmetics

7f5370a

Fix llamacpp_HF loading

8aeae3b

Change some comments

3361728

oobabooga merged commit e6eda5c into main Aug 27, 2023

oobabooga deleted the gguf2 branch August 30, 2023 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF #3695

GGUF #3695

oobabooga commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

oobabooga commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

berkut1 commented Aug 27, 2023

oobabooga commented Aug 27, 2023

berkut1 commented Aug 27, 2023 •

edited

Loading

GGUF #3695

GGUF #3695

Conversation

oobabooga commented Aug 26, 2023 • edited Loading

jllllll commented Aug 26, 2023 • edited Loading

oobabooga commented Aug 26, 2023 • edited Loading

jllllll commented Aug 26, 2023 • edited Loading

berkut1 commented Aug 27, 2023

oobabooga commented Aug 27, 2023

berkut1 commented Aug 27, 2023 • edited Loading

oobabooga commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

oobabooga commented Aug 26, 2023 •

edited

Loading

jllllll commented Aug 26, 2023 •

edited

Loading

berkut1 commented Aug 27, 2023 •

edited

Loading