Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantize: add imatrix and dataset metadata in GGUF #6658

Merged
merged 16 commits into from
Apr 26, 2024

Conversation

phymbert
Copy link
Collaborator

@phymbert phymbert commented Apr 13, 2024

Context

In the context of:

Add imatrix related metadata in quantum models.

Changes

  • quantize: add imatrix n entries, n_chunks, and dataset KV metadata
  • common: factorize KV Overrides parsing between common, server and quantize
  • quantize is now linked against common
  • llama: support kv overrides type string
  • imatrix: save the dataset file used in the output file

Tests

  1. Convert to GGUF (PHI-2)
pip install huggingface_hub
nohup python -c 'from huggingface_hub import snapshot_download; snapshot_download(repo_id="microsoft/phi-2", local_dir="models/phi-2")' > phi-2_download.log &
tail -f phi-2_download.log

./convert-hf-to-gguf.py models/phi-2 --outfile models/phi-2-f16.gguf --outtype f16
  1. Compute the importance matrix
./scripts/get-wikitext-2.sh
./build/bin/imatrix \
  --model models/phi-2-f16.gguf \
  -f wikitext-2-raw/wiki.train.raw \
  -o phi-2-f16.imatrix \
  -ngl 33 \
  --seed 42 \
  --chatml \
  --chunks 20 
  1. Quantize with the imatrix
./build/bin/quantize \
  --imatrix phi-2-f16.imatrix \
  models/phi-2-f16.gguf  \
  models/phi-2-q4_k_m.gguf \
 q4_k_m \
  --override-kv my_metadata=str:best-quantum-model-ever
  1. See the new metadata
./gguf-py/scripts/gguf-dump.py models/phi-2-q4_k_m.gguf
     23: UINT32     |        1 | general.quantization_version = 2
     24: STRING     |        1 | my_metadata = 'best-quantum-model-ever'
     25: STRING     |        1 | quantize.imatrix.file = 'imatrix-f16.imatrix'
     26: STRING     |        1 | quantize.imatrix.dataset = 'wikitext-2-raw/wiki.train.raw'
     27: INT32      |        1 | quantize.imatrix.entries_count = 192
     28: INT32      |        1 | quantize.imatrix.chunks_count = 20

./build/bin/main 
  --model .models/phi-2-q4_k_m.gguf \
  -ngl 33 \
 --random-prompt \
  --override-kv my_metadata_2=str:best-quantum-model-ever-2

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
...
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - kv  20:                                my_metadata str              = best-quantum-model-ever
llama_model_loader: - kv  21:                      quantize.imatrix.file str              = imatrix-f16.imatrix
llama_model_loader: - kv  22:                   quantize.imatrix.dataset str              = wikitext-2-raw/wiki.train.raw
llama_model_loader: - kv  23:             quantize.imatrix.entries_count i32              = 192
llama_model_loader: - kv  24:              quantize.imatrix.chunks_count i32              = 20
  1. Test no regression on the server
./build/bin/server \
  --model ../llama.cpp/models/phi-2-q4_k_m.gguf \
  -ngl 33 \
  --override-kv my_metadata_2=str:best-quantum-model-ever-2

Closes #6656

@phymbert phymbert added generation quality Quality of model output Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024
@phymbert phymbert requested a review from ggerganov April 13, 2024 13:00
@phymbert phymbert requested a review from slaren April 13, 2024 13:12
common/common.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.h Outdated Show resolved Hide resolved
@phymbert
Copy link
Collaborator Author

We might also add the number of chunks the imatrix was computed with

@phymbert phymbert added need feedback Testing and feedback with results are needed model Model specific and removed Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024
@phymbert phymbert mentioned this pull request Apr 13, 2024
13 tasks
@phymbert
Copy link
Collaborator Author

@ggerganov, is this general approach relevant ?

llama.h Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
@phymbert phymbert marked this pull request as draft April 19, 2024 20:08
common: free kv override if used after model loading
@phymbert phymbert marked this pull request as ready for review April 20, 2024 08:17
@phymbert phymbert requested a review from ggerganov April 20, 2024 08:17
@phymbert phymbert requested a review from Jeximo April 20, 2024 08:21

This comment was marked as off-topic.

llama.h Outdated Show resolved Hide resolved
@phymbert phymbert marked this pull request as draft April 20, 2024 11:59
@phymbert phymbert marked this pull request as ready for review April 21, 2024 18:24
@phymbert
Copy link
Collaborator Author

@slaren, can you please have a second check and merge it if approved

Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realized that llama_model_quantize_params::kv_overrides is a pointer to a std::vector for no reason whatsoever. It would be great if that could be fixed as well.

common/common.cpp Outdated Show resolved Hide resolved
examples/quantize/quantize.cpp Outdated Show resolved Hide resolved
@phymbert phymbert marked this pull request as draft April 21, 2024 19:00
@phymbert phymbert marked this pull request as ready for review April 26, 2024 10:11
@phymbert phymbert requested a review from slaren April 26, 2024 10:12
Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still need to change llama_model_quantize_params::kv_overrides to be a pointer to llama_model_kv_override rather than a std::vector, but it can be done in other PR.

@phymbert phymbert merged commit 0c4d489 into master Apr 26, 2024
43 of 55 checks passed
@phymbert phymbert deleted the hp/quantize/imatrix-metadata branch April 26, 2024 18:06
@schmorp
Copy link

schmorp commented Apr 27, 2024

While I appreciate adding this metadata, I think there is a privacy concern here - how about only storing the filename and not the complete path (which might leak sensitive data such as the username).

@phymbert
Copy link
Collaborator Author

Good point. Meanwhile, you can use kv overrides.

nopperl pushed a commit to nopperl/llama.cpp that referenced this pull request May 5, 2024
* imatrix: save the dataset file used in the output file

* llama: support kv overrides type string string

* common: factorize KV Overrides parsing between common and server

* quantize: add imatrix n entries and dataset KV metadata
quantize: factorize KV Overrides parsing between common
ggerganov#6656

* llama: remove kv override str_value initialization as it does not compile on some toolchain

* quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count`

* quantize: add imatrix filename in KV

* llama: add llama_model_kv_override_free

* common: add llama_model_kv_override_free
common: free kv override if used after model loading

* llama: finally move the string KV override value to the stack

* llama : minor

* no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators.

Co-authored-by: slaren <[email protected]>

* kv override: ensure string termination

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: slaren <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generation quality Quality of model output model Model specific need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

quantize: add imatrix and dataset metadata in GGUF
5 participants