-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PaddleInference] support ptq and cachekv_quant in BlockMultiHeadAttention op #59951
[PaddleInference] support ptq and cachekv_quant in BlockMultiHeadAttention op #59951
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
e16a233
to
121733e
Compare
3b56fd5
to
2e69d09
Compare
"be less than [%d] and greater than or equal to 0, but received [%d]", | ||
vocab_size, | ||
id); | ||
// PADDLE_ENFORCE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个为啥注释掉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前多轮会报错在这里 报错不符合预期 还在找作者debug
"be less than [%d] and greater than or equal to 0, but received [%d]", | ||
vocab_size, | ||
id); | ||
// PADDLE_ENFORCE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个检查还是要的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前多轮会报错在这里 报错不符合预期 还在找作者debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for API change
cache_k_quant_scales=None, | ||
cache_v_quant_scales=None, | ||
cache_k_dequant_scales=None, | ||
cache_v_dequant_scales=None, | ||
qkv_out_scale=None, | ||
qkv_bias=None, | ||
out_shift=None, | ||
out_smooth=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added parameters, such as cache_k_quant_scales
, need to be described in docstring of section Args
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, the added parameters should be added at the end to ensure compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be added in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs(需要补充中文文档)
…ntion op (PaddlePaddle#59951) * support cachekv_quant in blha --------- Co-authored-by: Wanglongzhi2001 <[email protected]>
…ntion op (PaddlePaddle#59951) * support cachekv_quant in blha --------- Co-authored-by: Wanglongzhi2001 <[email protected]>
…ntion op (#59951) (#60073) * support cachekv_quant in blha --------- Co-authored-by: Wanglongzhi2001 <[email protected]>
PR types
Function optimization
PR changes
OPs
Description
For block attention
Pcard-71502