-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add support for new autogptq checkpoint_format #3689
[Misc] Add support for new autogptq checkpoint_format #3689
Conversation
Thanks, will review this |
Do you have examples models in each format I can try? |
To test marlin compat just change the
|
Each of the following works:
from vllm import LLM
model = LLM("LnL-AI/TinyLlama-1.1B-Chat-v1.0-GPTQ-Marlin-4bit")
model.generate("hello my name is")
from vllm import LLM
odel = LLM("neuralmagic/TinyLlama-1.1B-Chat-v1.0-marlin")
model.generate("hello my name is") |
@Qubitium Can you merge the PR I posted to your branch, which adds tests for this case? |
added quantization tests
@robertgshaw2-neuralmagic merged |
@Qubitium can you rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm saw you commit. Letting CI run then will merge
) Co-authored-by: Robert Shaw <[email protected]>
Reason for PR: make sure loading of gptq marlin quants is compatible with latest autogptq
Due to autogpq update AutoGPTQ/AutoGPTQ#603 quants made with autogptq >=0.8.0-dev will use a new
checkpoint_format
property to holdmarlin
and future formats.Compat tested with both marlin and non-marlin in both old
is_marlin_format
and newcheckpoint_format