gguf : add 64-bit support (GGUF v2) #2821

ggerganov · 2023-08-26T18:53:48Z

Adding 64-bit support as discussed: ggerganov/ggml#302 (comment)

Help with testing is appreciated. Should be backward compatible with v1

klosax · 2023-08-26T19:03:37Z

We should add types uint64_t , int64_t and double
And we should change to uint64_t on all lengths / sizes / counts just to be safe and future-proof, not only for tensor dimensions.

ggerganov · 2023-08-26T19:06:33Z

Need some help with the Python code

In the meantime, I will now add V1 backward comp in ggml.c reading

klosax · 2023-08-26T19:18:21Z

We should change to uint64_t on all lengths / sizes / counts just to be safe and future-proof, not only change tensor dimensions.

…32_t

ggml.c

KerfuffleV2 · 2023-08-26T20:48:36Z

I tested loading a couple GGUF v1 models, the backward compatibility seems to work fine.

ghost · 2023-08-26T21:16:15Z

Similarly, no issues loading various v1 models.

klosax

Both versions work good.

klosax · 2023-08-26T22:12:17Z

We can actually use quantizeto losslessly convert gguf v1 to v2, if the same format is chosen.

philpax · 2023-08-26T22:22:24Z

Looks good, is the plan to update the metadata values for the lengths/etc before merge?

ghost · 2023-08-26T22:37:14Z

We can actually use quantizeto losslessly convert gguf v1 to v2, if the same format is chosen.

@klosax Ah, that's useful. For a 7b q4_0 model, I use ./quantize ~/wizardLM.gguf 2 3

I don't need --allow-requantize or --leave-output-tensor, right?

klosax · 2023-08-26T22:43:00Z

I don't need --allow-requantize or --leave-output-tensor, right?

I dont think those parameters are needed. Maybe we should have a new parameter --copy-all-tensors instead so quant format wont matter.

KerfuffleV2 · 2023-08-26T22:55:48Z

I dont think those parameters are needed.

llama.cpp/llama.cpp

Lines 4743 to 4746 in 730d9c6

    
           // quantize only 2D tensors 
        
           quantize &= (tensor->n_dims == 2); 
        
           quantize &= params->quantize_output_tensor || name != "output.weight"; 
        
           quantize &= quantized_type != tensor->type;

That logic is actually kind of wrong because the k-quants stuff can choose a different type than quantized_type. There's also no check after that part to see if the special k-quants type is the same as the current tensor type, it just tries to quantize (or fails if --allow-requantize isn't set).

It probably will work for the non-k-quants types but pretty sure k-quants won't work. (There were also some changes to the decisions k-quants makes for LLaMA2 70B models so in that particular case it wouldn't pass through all the tensors even if the other issues were dealt with.)

ghost · 2023-08-26T23:09:04Z

Thanks. I used quantize q4_0 wizardlm and llama2. They load as GGUF V2, and appear working. I'll beware quantize and k-quants.

ggerganov · 2023-08-27T10:53:30Z

Thanks everyone for testing. We should merge this - anything else we won't to try before this?

* gguf : bump version to 2 * gguf : add support for 64-bit (no backwards comp yet) * gguf : v1 backwards comp * gguf.py : bump GGUF version * gguf.py : uint64_t on all lengths, sizes and counts, enums still uint32_t * gguf.py : string lengths uint32_t * gguf : update all counts to 64-bit * gguf.py : string len uint64_t and n_dims uint32_t * gguf : fix typo * llama.cpp : print gguf version --------- Co-authored-by: klosax <[email protected]>

pudepiedj · 2023-09-04T09:17:22Z

I am a long-term enthusiast for whisper.cpp which I use by default nowadays to transcribe my podcast Unmaking Sense.
I am new to Llama, so apologise if this isn't useful, but a few comments:

A big thank you for all the work on both these projects, which are exemplary.
I've quantized successfully this morning from the original Meta AI download llama-2-13B-chat through F16 to q8_0 GGUF and it runs straight away on a MacBook Pro M2 Max with 32GB RAM. q4_0 also runs, of course, but that isn't new.
It says "Ctrl-C" allows interaction, but mine just aborts when running in terminal, rather as I would expect. I am obviously missing something.
I am not sufficiently technically competent to offer much by way of coding collaboration but I'd happily write some user documentation if it would help coming from someone starting out on this journey who asks daft questions based on impressive levels of ignorance.
Question: is there a way to prevent Llama-2-13B from producing random responses of indeterminate length and almost no relevance? In the screenshots in the repo you seem to have managed to force it to do the "meaning of life" question repeatedly, but I have no idea how you make it do that or control what kind of content it produces.
Sorry to waste your time if this isn't helpful or breaches github protocols in some way.

Green-Sky · 2023-09-04T09:23:05Z

It says "Ctrl-C" allows interaction, but mine just aborts when running in terminal, rather as I would expect. I am obviously missing something.

did you press it more than once? It queues a stop and gives you the control, and then if pressed again, exits the program. try to play with it a bit more :)

Question: is there a way to prevent Llama-2-13B from producing random responses of indeterminate length and almost no relevance? In the screenshots in the repo you seem to have managed to force it to do the "meaning of life" question repeatedly, but I have no idea how you make it do that or control what kind of content it produces.

did you use the prompt template?

pudepiedj · 2023-09-04T10:38:18Z

It says "Ctrl-C" allows interaction, but mine just aborts when running in terminal, rather as I would expect. I am obviously missing something.

did you press it more than once? It queues a stop and gives you the control, and then if pressed again, exits the program. try to play with it a bit more :)

It seems that if you use Ctrl-C while the assistant is printing a reply, it behaves as expected and described, but if you press it afterwards, it aborts. Thanks for the hint.

Question: is there a way to prevent Llama-2-13B from producing random responses of indeterminate length and almost no relevance? In the screenshots in the repo you seem to have managed to force it to do the "meaning of life" question repeatedly, but I have no idea how you make it do that or control what kind of content it produces.

did you use the prompt template?

I hadn't, but now I have. Thank you, again. Unfortunately it seems to lead to a collapse of the quality of the response to a point where it is worthless, but I therefore obviously need to investigate the process more.

KerfuffleV2 · 2023-09-04T12:56:40Z

If you'd need to follow up, I'd suggest making an issue specifically to discuss your problem. This is a pull request that doesn't seem directly related.

gguf : bump version to 2

5f1fffd

gguf : add support for 64-bit (no backwards comp yet)

4f0547e

gguf : v1 backwards comp

3656b3c

ggerganov marked this pull request as ready for review August 26, 2023 19:12

ggerganov changed the title ~~gguf : add 64-bit support~~ gguf : add 64-bit support (GGUF v2) Aug 26, 2023

gguf.py : bump GGUF version

ba335ff

ggerganov mentioned this pull request Aug 26, 2023

GGUF file format specification ggerganov/ggml#302

Merged

klosax and others added 4 commits August 26, 2023 21:23

gguf.py : uint64_t on all lengths, sizes and counts, enums still uint…

be726c5

…32_t

gguf.py : string lengths uint32_t

bc3eaf2

gguf : update all counts to 64-bit

6d369a1

gguf.py : string len uint64_t and n_dims uint32_t

09b6da7

cebtenzzre reviewed Aug 26, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

gguf : fix typo

b61b170

llama.cpp : print gguf version

33a5517

klosax approved these changes Aug 26, 2023

View reviewed changes

ggerganov merged commit d0cee0d into master Aug 27, 2023
25 checks passed

berkut1 mentioned this pull request Aug 27, 2023

Add Support for the new GGUF format which replaces GGML oobabooga/text-generation-webui#3676

Closed

pseudotensor mentioned this pull request Aug 28, 2023

Support transformers/TheBloke new AutoGPT integration and llama.cpp GGUFv2 quantization h2oai/h2ogpt#773

Closed

ghost mentioned this pull request Aug 31, 2023

Allow quantize to only copy tensors, other improvements #2931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf : add 64-bit support (GGUF v2) #2821

gguf : add 64-bit support (GGUF v2) #2821

ggerganov commented Aug 26, 2023 •

edited

Loading

klosax commented Aug 26, 2023

ggerganov commented Aug 26, 2023

klosax commented Aug 26, 2023

KerfuffleV2 commented Aug 26, 2023

ghost commented Aug 26, 2023

klosax left a comment

klosax commented Aug 26, 2023

philpax commented Aug 26, 2023

ghost commented Aug 26, 2023 •

edited by ghost

Loading

klosax commented Aug 26, 2023

KerfuffleV2 commented Aug 26, 2023

ghost commented Aug 26, 2023

ggerganov commented Aug 27, 2023

pudepiedj commented Sep 4, 2023

Green-Sky commented Sep 4, 2023

pudepiedj commented Sep 4, 2023

KerfuffleV2 commented Sep 4, 2023

gguf : add 64-bit support (GGUF v2) #2821

gguf : add 64-bit support (GGUF v2) #2821

Conversation

ggerganov commented Aug 26, 2023 • edited Loading

klosax commented Aug 26, 2023

ggerganov commented Aug 26, 2023

klosax commented Aug 26, 2023

KerfuffleV2 commented Aug 26, 2023

ghost commented Aug 26, 2023

klosax left a comment

Choose a reason for hiding this comment

klosax commented Aug 26, 2023

philpax commented Aug 26, 2023

ghost commented Aug 26, 2023 • edited by ghost Loading

klosax commented Aug 26, 2023

KerfuffleV2 commented Aug 26, 2023

ghost commented Aug 26, 2023

ggerganov commented Aug 27, 2023

pudepiedj commented Sep 4, 2023

Green-Sky commented Sep 4, 2023

pudepiedj commented Sep 4, 2023

KerfuffleV2 commented Sep 4, 2023

ggerganov commented Aug 26, 2023 •

edited

Loading

ghost commented Aug 26, 2023 •

edited by ghost

Loading