Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Add support for new autogptq checkpoint_format #3689

Merged

Conversation

Qubitium
Copy link
Contributor

@Qubitium Qubitium commented Mar 28, 2024

Reason for PR: make sure loading of gptq marlin quants is compatible with latest autogptq

Due to autogpq update AutoGPTQ/AutoGPTQ#603 quants made with autogptq >=0.8.0-dev will use a new checkpoint_format property to hold marlin and future formats.

Compat tested with both marlin and non-marlin in both old is_marlin_format and new checkpoint_format

@Qubitium Qubitium changed the title [MISC] Add support for new autogptq checkpoint_format [Misc] Add support for new autogptq checkpoint_format Mar 28, 2024
@Qubitium Qubitium marked this pull request as ready for review March 28, 2024 15:23
@robertgshaw2-neuralmagic
Copy link
Collaborator

Thanks, will review this

@robertgshaw2-neuralmagic
Copy link
Collaborator

Do you have examples models in each format I can try?

@Qubitium
Copy link
Contributor Author

@robertgshaw2-neuralmagic

  1. Marlin: https://huggingface.co/LnL-AI/TinyLlama-1.1B-Chat-v1.0-GPTQ-Marlin-4bit
  2. Non-Marlin: https://huggingface.co/LnL-AI/TinyLlama-1.1B-Chat-v1.0-GPTQ-4bit

To test marlin compat just change the quntize_config param:

"checkpoint_format": "marlin" => "is_marlin_format": true

@robertgshaw2-neuralmagic
Copy link
Collaborator

Each of the following works:

  • new serialization config
from vllm import LLM
model = LLM("LnL-AI/TinyLlama-1.1B-Chat-v1.0-GPTQ-Marlin-4bit")
model.generate("hello my name is")
  • old serialization config
from vllm import LLM
odel = LLM("neuralmagic/TinyLlama-1.1B-Chat-v1.0-marlin")
model.generate("hello my name is")

@robertgshaw2-neuralmagic
Copy link
Collaborator

@Qubitium Can you merge the PR I posted to your branch, which adds tests for this case?

Qubitium#1

@Qubitium
Copy link
Contributor Author

@robertgshaw2-neuralmagic merged

@robertgshaw2-neuralmagic
Copy link
Collaborator

@Qubitium can you rerun ./format.sh? Will merge after

Copy link
Collaborator

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm saw you commit. Letting CI run then will merge

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic merged commit 7d4e1b8 into vllm-project:main Apr 1, 2024
33 checks passed
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants