Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Optimize p_norm gpu #69660

Merged

Conversation

HydrogenSulfate
Copy link
Contributor

PR Category

Performance Optimization

PR Types

Improvements

Description

Pcard-75624

针对p=1/2/3的case,优化 p_norm 的 GPU functor

优化前

  N=10 N=100 N=1000 N=10000
p=1 3.09E-02 4.19E-02 4.99E-02 1.21E+00
p=2 3.11E-02 4.18E-02 4.96E-02 1.24E+00
p=3 3.08E-02 4.16E-02 4.94E-02 1.21E+00

优化后

  N=10 N=100 N=1000 N=10000
p=1 2.62E-02 3.24E-02 3.29E-02 5.12E-01
p=2 3.11E-02 3.70E-02 3.73E-02 4.94E-01
p=3 3.08E-02 3.64E-02 3.71E-02 4.95E-01

耗时减少比例

  N=10 N=100 N=1000 N=10000
p=1 15% 23% 34% 58%
p=2 0% 11% 25% 60%
p=3 0% 13% 25% 59%

Copy link

paddle-bot bot commented Nov 24, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

phi::funcs::ElementwiseKernel<T>(
dev_ctx, ins, &outs, UnsignedPowFunctor<T>(1. / porder));
if (porder != 1.0) {
// save computation when porder is 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

porder is 1.0的描述和判断条件 porder != 1.0 不太对应

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

porder is 1.0的描述和判断条件 porder != 1.0 不太对应

代码上应该是没问题的,1.0的时候不用跑下面的幂运算,注释我后续移动到上面去

@HydrogenSulfate HydrogenSulfate merged commit a291887 into PaddlePaddle:develop Nov 25, 2024
27 of 28 checks passed
@HydrogenSulfate HydrogenSulfate deleted the optimize_p_norm_gpu branch November 25, 2024 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants