Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different dtypes of inputs for broadcast for dropout optimization #52093

Merged
merged 9 commits into from
Apr 27, 2023

Conversation

zhangbopd
Copy link
Contributor

@zhangbopd zhangbopd commented Mar 24, 2023

PR types

Performance optimization

PR changes

OPs

Description

This PR aims to support different dtypes of inputs for Broadcast so as to do dropoutGrad dropout_nd and dropout_ndGrad optimization (dropout optimization with the same method was done previously)

  • No longer need template parameterInT
  • Keep the BroadcastDataLoader template partial specialization and BroadcastLoadType for high performance in some cases
  • Standardize the XPU/GPU broadcast data loader interface
  • Optimize BroadcastWithInt64Index
  • Moving the condition check forward

@paddle-bot
Copy link

paddle-bot bot commented Mar 24, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhangbopd zhangbopd changed the title DropoutGrad optimization & Support different dtypes of inputs for Broadcast Support different dtypes of inputs for broadcast for dropout optimization Apr 12, 2023
jiweibo
jiweibo previously approved these changes Apr 12, 2023
Copy link
Contributor

@jiweibo jiweibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for size.

Shixiaowei02
Shixiaowei02 previously approved these changes Apr 13, 2023
@JamesLim-sy
Copy link
Contributor

BTW, 请补充针对Dropout_grad 优化前后的性能数据

@zhangbopd
Copy link
Contributor Author

zhangbopd commented Apr 13, 2023

BTW, 请补充针对Dropout_grad 优化前后的性能数据

OK,这部分工作在 #52969 中清理 ElementwiseType 和 InT 时完成,性能数据放在对应PR描述中

ZzSean
ZzSean previously approved these changes Apr 14, 2023
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for CI-OP-Benchmark

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Apr 23, 2023

Sorry to inform you that a6119cc's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前先review了BroadcastFunction.h

Copy link
Contributor

@shaojiewang shaojiewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost good, just one comment

paddle/phi/kernels/funcs/dropout_impl.cu.h Outdated Show resolved Hide resolved
@zhangbopd zhangbopd dismissed stale reviews from ZzSean, Shixiaowei02, and jiweibo via a5c47c5 April 27, 2023 07:43
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for CI-OP-Benchmark

Copy link
Contributor

@shaojiewang shaojiewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Xreki Xreki merged commit 3474e09 into PaddlePaddle:develop Apr 27, 2023
@zhangbopd zhangbopd removed the request for review from JamesLim-sy May 4, 2023 11:50
zhangbopd added a commit to zhangbopd/Paddle that referenced this pull request May 9, 2023
…tion (PaddlePaddle#52093)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* PR comment
XiaoguangHu01 pushed a commit that referenced this pull request May 10, 2023
…to Release/2.5 (#53623)

* Support different dtypes of inputs for broadcast for dropout optimization  (#52093)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* PR comment

* dropout_nd_optimization (#51479)

* with printf

* add DropOutNdForwardKernel

* PR comment

* Dropout optimize & clean broadcast inT and ElementwiseType (#52969)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

* Fix xpu2 kp compile error (#53548)

* fix conflict

* conflict
@zhangbopd zhangbopd removed the request for review from Shixiaowei02 September 19, 2023 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants