Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PaddleInference] support ptq and cachekv_quant in BlockMultiHeadAttention op #59951

Merged
merged 10 commits into from
Dec 15, 2023

Conversation

RichardWooSJTU
Copy link
Contributor

@RichardWooSJTU RichardWooSJTU commented Dec 12, 2023

PR types

Function optimization

PR changes

OPs

Description

For block attention

  1. support dynamic cachekv quant
  2. support static cachekv quant
  3. support PTQ fusion

Pcard-71502

Copy link

paddle-bot bot commented Dec 12, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@RichardWooSJTU RichardWooSJTU changed the title [PaddleInference] support cachekv_quant in BlockMultiHeadAttention op [PaddleInference] support ptq and cachekv_quant in BlockMultiHeadAttention op Dec 13, 2023
"be less than [%d] and greater than or equal to 0, but received [%d]",
vocab_size,
id);
// PADDLE_ENFORCE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个为啥注释掉?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前多轮会报错在这里 报错不符合预期 还在找作者debug

"be less than [%d] and greater than or equal to 0, but received [%d]",
vocab_size,
id);
// PADDLE_ENFORCE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个检查还是要的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前多轮会报错在这里 报错不符合预期 还在找作者debug

Copy link
Contributor

@vivienfanghuagood vivienfanghuagood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API change

Comment on lines +33 to +40
cache_k_quant_scales=None,
cache_v_quant_scales=None,
cache_k_dequant_scales=None,
cache_v_dequant_scales=None,
qkv_out_scale=None,
qkv_bias=None,
out_shift=None,
out_smooth=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added parameters, such as cache_k_quant_scales, need to be described in docstring of section Args below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, the added parameters should be added at the end to ensure compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be added in next PR.

Copy link
Contributor

@Ligoml Ligoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs(需要补充中文文档)

@heavengate heavengate merged commit 036a314 into PaddlePaddle:develop Dec 15, 2023
28 of 29 checks passed
RichardWooSJTU added a commit to RichardWooSJTU/Paddle that referenced this pull request Dec 15, 2023
…ntion op (PaddlePaddle#59951)

* support cachekv_quant in blha

---------

Co-authored-by: Wanglongzhi2001 <[email protected]>
RichardWooSJTU added a commit to RichardWooSJTU/Paddle that referenced this pull request Dec 15, 2023
…ntion op (PaddlePaddle#59951)

* support cachekv_quant in blha

---------

Co-authored-by: Wanglongzhi2001 <[email protected]>
raindrops2sea pushed a commit that referenced this pull request Dec 18, 2023
…ntion op (#59951) (#60073)

* support cachekv_quant in blha

---------

Co-authored-by: Wanglongzhi2001 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants