-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantize
: add imatrix and dataset metadata in GGUF
#6656
Labels
enhancement
New feature or request
generation quality
Quality of model output
model
Model specific
need feedback
Testing and feedback with results are needed
Comments
phymbert
added
enhancement
New feature or request
model
Model specific
generation quality
Quality of model output
Less than 4 bits
Efforts related to viable quantized models using <4 bits
labels
Apr 13, 2024
@ggerganov @ikawrakow Thoughts ? Can I give it a try ? |
phymbert
added a commit
that referenced
this issue
Apr 13, 2024
quantize: factorize KV Overrides parsing between common #6656
What happen if you combine different imatrix? if you combine a lots of .dat, how would metadata work? |
As far as I understand, It will add the metadata from the final imatrix, but you are pleased to test. |
phymbert
added
need feedback
Testing and feedback with results are needed
and removed
Less than 4 bits
Efforts related to viable quantized models using <4 bits
labels
Apr 13, 2024
phymbert
added a commit
that referenced
this issue
Apr 26, 2024
* imatrix: save the dataset file used in the output file * llama: support kv overrides type string string * common: factorize KV Overrides parsing between common and server * quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common #6656 * llama: remove kv override str_value initialization as it does not compile on some toolchain * quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count` * quantize: add imatrix filename in KV * llama: add llama_model_kv_override_free * common: add llama_model_kv_override_free common: free kv override if used after model loading * llama: finally move the string KV override value to the stack * llama : minor * no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Co-authored-by: slaren <[email protected]> * kv override: ensure string termination --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: slaren <[email protected]>
nopperl
pushed a commit
to nopperl/llama.cpp
that referenced
this issue
May 5, 2024
* imatrix: save the dataset file used in the output file * llama: support kv overrides type string string * common: factorize KV Overrides parsing between common and server * quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common ggerganov#6656 * llama: remove kv override str_value initialization as it does not compile on some toolchain * quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count` * quantize: add imatrix filename in KV * llama: add llama_model_kv_override_free * common: add llama_model_kv_override_free common: free kv override if used after model loading * llama: finally move the string KV override value to the stack * llama : minor * no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Co-authored-by: slaren <[email protected]> * kv override: ensure string termination --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: slaren <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New feature or request
generation quality
Quality of model output
model
Model specific
need feedback
Testing and feedback with results are needed
Motivation
I was reading thanks to @julien-c this reddit post from @he29-net 👍
Proposal
Add at the end of the
imatrix
binary file the dataset name on which the imatrix was computed onAdd following KV in
quantize
:quantize.imatrix.file
Filename of the provided imatrix during quantizationquantize.imatrix.entries_count
Number of entries in the imatrixquantize.imatrix.dataset
Dataset from the imatrixquantize.imatrix.chunks_count
Number of chunks the imatrix was computed withIdeally I would also add both imatrix and dataset files hashes in the metadata, but I am not sure this is supported and appropriate.
The text was updated successfully, but these errors were encountered: