Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT / Bitsandbytes: Add
dequantize
API for bitsandbytes quantized models #30806FEAT / Bitsandbytes: Add
dequantize
API for bitsandbytes quantized models #30806Changes from 3 commits
e0c39a9
c748425
14d51c2
be7af7c
1cff84d
ba01b82
1a4a906
309581b
2813fb8
8b904f7
7f17efa
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user might want to know in which precision the model was dequantized since they don't have the possibility to control that. I think it could be great to give that information since there is no default value (as opposed to
from_pretrained
which loads the model in fp32).Two ways to get that:
dequantize_4bit
. In the method, you see that they get the output dtype with weight.quant_state.dtype.We can potentially add a
torch_dtype
attribute in the future if it makes sense.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! The output dtype should be correctly inferred here: https://github.com/TimDettmers/bitsandbytes/blob/b891f80ba514833f41f0e9226983b02a9fb5c44b/bitsandbytes/functional.py#L1349 through the compute_dtype so it should be accurate - I added a
warning_once
staement to inform users on the dequantized dtype: 1a4a906There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already imported at the top of the module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch ! Should be fixed now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general comment, if instead you could have a private method
_dequantize_and_replace
, which handles the recursion, you don't need to returnhas_been_replaced
here. When someone callsdequantize_and_replace
, I don't thinkhas_been_replaced
is ever used and could be confusing e.g.:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense ! Will do !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8b904f7 !