-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different dtypes of inputs for broadcast for dropout optimization #52093
Support different dtypes of inputs for broadcast for dropout optimization #52093
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
833dcee
to
9971126
Compare
9b7e5bd
to
df71250
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for size.
BTW, 请补充针对Dropout_grad 优化前后的性能数据 |
OK,这部分工作在 #52969 中清理 ElementwiseType 和 InT 时完成,性能数据放在对应PR描述中 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for CI-OP-Benchmark
Sorry to inform you that a6119cc's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前先review了BroadcastFunction.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost good, just one comment
a5c47c5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for CI-OP-Benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…tion (PaddlePaddle#52093) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * PR comment
…to Release/2.5 (#53623) * Support different dtypes of inputs for broadcast for dropout optimization (#52093) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * PR comment * dropout_nd_optimization (#51479) * with printf * add DropOutNdForwardKernel * PR comment * Dropout optimize & clean broadcast inT and ElementwiseType (#52969) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * clean ElementwiseT and InT for BroadcastKernel * default axis and clean inT * remove redundant fast divmod computation * optimize drop_nd & drop_nd_grad * optimize BroadcastDataLoader bf16 fp16 * rm InT etc. after merge develop * delete constexpr for windows ci * fix conflict * fix conflic with develop * fix conflic * new clean * clean * Fix xpu2 kp compile error (#53548) * fix conflict * conflict
PR types
Performance optimization
PR changes
OPs
Description
This PR aims to support different dtypes of inputs for Broadcast so as to do
dropoutGrad
dropout_nd
anddropout_ndGrad
optimization (dropout optimization with the same method was done previously)InT
BroadcastDataLoader
template partial specialization andBroadcastLoadType
for high performance in some casesBroadcastWithInt64Index