`quantize`: add imatrix and dataset metadata in GGUF #6656

phymbert · 2024-04-13T10:13:08Z

Motivation

I was reading thanks to @julien-c this reddit post from @he29-net 👍

You can't easily tell whether a model was quantized with the help of importance matrix just from the name. I first found this annoying, because it was not clear if and how the calibration dataset affects performance of the model in other than just positive ways. But recent tests in llama.cpp discussion #5263 show, that while the data used to prepare the imatrix slightly affect how it performs in (un)related languages or specializations, any dataset will perform better than a "vanilla" quantization with no imatrix. So now, instead, I find it annoying because sometimes the only way to be sure I'm using the better imatrix version is to re-quantize the model myself.

Proposal

Add at the end of the imatrix binary file the dataset name on which the imatrix was computed on
Add following KV in quantize:
- quantize.imatrix.file Filename of the provided imatrix during quantization
- quantize.imatrix.entries_count Number of entries in the imatrix
- quantize.imatrix.dataset Dataset from the imatrix
- quantize.imatrix.chunks_count Number of chunks the imatrix was computed with

Ideally I would also add both imatrix and dataset files hashes in the metadata, but I am not sure this is supported and appropriate.

The text was updated successfully, but these errors were encountered:

phymbert · 2024-04-13T10:25:40Z

@ggerganov @ikawrakow Thoughts ? Can I give it a try ?

quantize: factorize KV Overrides parsing between common #6656

sorasoras · 2024-04-13T14:32:02Z

What happen if you combine different imatrix? if you combine a lots of .dat, how would metadata work?

phymbert · 2024-04-13T15:45:18Z

What happen if you combine different imatrix? if you combine a lots of .dat, how would metadata work?

As far as I understand, It will add the metadata from the final imatrix, but you are pleased to test.

* imatrix: save the dataset file used in the output file * llama: support kv overrides type string string * common: factorize KV Overrides parsing between common and server * quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common #6656 * llama: remove kv override str_value initialization as it does not compile on some toolchain * quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count` * quantize: add imatrix filename in KV * llama: add llama_model_kv_override_free * common: add llama_model_kv_override_free common: free kv override if used after model loading * llama: finally move the string KV override value to the stack * llama : minor * no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Co-authored-by: slaren <[email protected]> * kv override: ensure string termination --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: slaren <[email protected]>

* imatrix: save the dataset file used in the output file * llama: support kv overrides type string string * common: factorize KV Overrides parsing between common and server * quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common ggerganov#6656 * llama: remove kv override str_value initialization as it does not compile on some toolchain * quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count` * quantize: add imatrix filename in KV * llama: add llama_model_kv_override_free * common: add llama_model_kv_override_free common: free kv override if used after model loading * llama: finally move the string KV override value to the stack * llama : minor * no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Co-authored-by: slaren <[email protected]> * kv override: ensure string termination --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: slaren <[email protected]>

phymbert added enhancement New feature or request model Model specific generation quality Quality of model output Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024

phymbert added a commit that referenced this issue Apr 13, 2024

quantize: add imatrix n entries and dataset KV metadata

262c95a

quantize: factorize KV Overrides parsing between common #6656

phymbert mentioned this issue Apr 13, 2024

quantize: add imatrix and dataset metadata in GGUF #6658

Merged

phymbert added need feedback Testing and feedback with results are needed and removed Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024

phymbert closed this as completed in #6658 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`quantize`: add imatrix and dataset metadata in GGUF #6656

`quantize`: add imatrix and dataset metadata in GGUF #6656

phymbert commented Apr 13, 2024 •

edited

Loading

phymbert commented Apr 13, 2024

sorasoras commented Apr 13, 2024

phymbert commented Apr 13, 2024

quantize: add imatrix and dataset metadata in GGUF #6656

quantize: add imatrix and dataset metadata in GGUF #6656

Comments

phymbert commented Apr 13, 2024 • edited Loading

Motivation

Proposal

phymbert commented Apr 13, 2024

sorasoras commented Apr 13, 2024

phymbert commented Apr 13, 2024

`quantize`: add imatrix and dataset metadata in GGUF #6656

`quantize`: add imatrix and dataset metadata in GGUF #6656

phymbert commented Apr 13, 2024 •

edited

Loading