Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantize: add imatrix and dataset metadata in GGUF #6656

Closed
phymbert opened this issue Apr 13, 2024 · 3 comments · Fixed by #6658
Closed

quantize: add imatrix and dataset metadata in GGUF #6656

phymbert opened this issue Apr 13, 2024 · 3 comments · Fixed by #6658
Labels
enhancement New feature or request generation quality Quality of model output model Model specific need feedback Testing and feedback with results are needed

Comments

@phymbert
Copy link
Collaborator

phymbert commented Apr 13, 2024

Motivation

I was reading thanks to @julien-c this reddit post from @he29-net 👍

You can't easily tell whether a model was quantized with the help of importance matrix just from the name. I first found this annoying, because it was not clear if and how the calibration dataset affects performance of the model in other than just positive ways. But recent tests in llama.cpp discussion #5263 show, that while the data used to prepare the imatrix slightly affect how it performs in (un)related languages or specializations, any dataset will perform better than a "vanilla" quantization with no imatrix. So now, instead, I find it annoying because sometimes the only way to be sure I'm using the better imatrix version is to re-quantize the model myself.

Proposal

  • Add at the end of the imatrix binary file the dataset name on which the imatrix was computed on

  • Add following KV in quantize:

    • quantize.imatrix.file Filename of the provided imatrix during quantization
    • quantize.imatrix.entries_count Number of entries in the imatrix
    • quantize.imatrix.dataset Dataset from the imatrix
    • quantize.imatrix.chunks_count Number of chunks the imatrix was computed with

Ideally I would also add both imatrix and dataset files hashes in the metadata, but I am not sure this is supported and appropriate.

@phymbert phymbert added enhancement New feature or request model Model specific generation quality Quality of model output Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024
@phymbert
Copy link
Collaborator Author

@ggerganov @ikawrakow Thoughts ? Can I give it a try ?

phymbert added a commit that referenced this issue Apr 13, 2024
quantize: factorize KV Overrides parsing between common
#6656
@sorasoras
Copy link

What happen if you combine different imatrix? if you combine a lots of .dat, how would metadata work?

@phymbert
Copy link
Collaborator Author

What happen if you combine different imatrix? if you combine a lots of .dat, how would metadata work?

As far as I understand, It will add the metadata from the final imatrix, but you are pleased to test.

@phymbert phymbert added need feedback Testing and feedback with results are needed and removed Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024
phymbert added a commit that referenced this issue Apr 26, 2024
* imatrix: save the dataset file used in the output file

* llama: support kv overrides type string string

* common: factorize KV Overrides parsing between common and server

* quantize: add imatrix n entries and dataset KV metadata
quantize: factorize KV Overrides parsing between common
#6656

* llama: remove kv override str_value initialization as it does not compile on some toolchain

* quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count`

* quantize: add imatrix filename in KV

* llama: add llama_model_kv_override_free

* common: add llama_model_kv_override_free
common: free kv override if used after model loading

* llama: finally move the string KV override value to the stack

* llama : minor

* no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators.

Co-authored-by: slaren <[email protected]>

* kv override: ensure string termination

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: slaren <[email protected]>
nopperl pushed a commit to nopperl/llama.cpp that referenced this issue May 5, 2024
* imatrix: save the dataset file used in the output file

* llama: support kv overrides type string string

* common: factorize KV Overrides parsing between common and server

* quantize: add imatrix n entries and dataset KV metadata
quantize: factorize KV Overrides parsing between common
ggerganov#6656

* llama: remove kv override str_value initialization as it does not compile on some toolchain

* quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count`

* quantize: add imatrix filename in KV

* llama: add llama_model_kv_override_free

* common: add llama_model_kv_override_free
common: free kv override if used after model loading

* llama: finally move the string KV override value to the stack

* llama : minor

* no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators.

Co-authored-by: slaren <[email protected]>

* kv override: ensure string termination

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: slaren <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request generation quality Quality of model output model Model specific need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants