Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 3 No.37】为 Paddle 优化 argmin_argmax op 在 GPU 上的计算性能 #46655

Closed
wants to merge 4 commits into from

Conversation

thunder95
Copy link
Contributor

@thunder95 thunder95 commented Sep 29, 2022

PR types

Performance optimization

PR changes

OPs

Describe

目前 Paddle 内 argmax\argmin 算子的 GPU 实现采用了 Cub 库实现,可以用 Reduce 替换,Reduce 模块的性能需进一步提升。
设计文档: PaddlePaddle/community#256

  • 开发环境:
  1. 设备:RTX 2070s
  2. 环境:CUDA10.2,cuDNN 7
  • 优化方法
    对kps::Reduce进行改写,支持索引返回

完成优化后,Paddle与优化前的Paddle的性能对比效果:

Case No. input_shape  dtype axis Paddle Perf(ms) argmin Old Paddle Perf(ms) argmin diff
0 [-1L, 513L, 513L, 19L] float32 3 15.0504 15.0504 15.0504
1 [-1L, 513L, 513L, 19L] float32 1 20.0625 20.0625 15.0504
2 [1000L, 1000L] float32 -1 0.16095 0.16095 15.0504
3 [1000L, 1000L] float32 0 0.7225 0.7225 15.0504

完成优化后,Paddle与Pytorch的性能对比效果如下:

Case No. input_shape  dtype axis Paddle Perf(ms) argmin Pytorh Perf(ms) argmin diff
0 [-1L, 513L, 513L, 19L] float32 3 10.426 10.426 15.0504
1 [-1L, 513L, 513L, 19L] float32 1 2.4442 2.4442 15.0504
2 [1000L, 1000L] float32 -1 0.03902 0.03902 15.0504
3 [1000L, 1000L] float32 0 0.04725 0.04725 15.0504

@thunder95 thunder95 changed the title 【PaddlePaddle Hackathon 3 No.31】为 Paddle 优化 argmin_argmax op 在 GPU 上的计算性能 【PaddlePaddle Hackathon 3 No.37】为 Paddle 优化 argmin_argmax op 在 GPU 上的计算性能 Sep 29, 2022
@paddle-bot-old paddle-bot-old bot added the contributor External developers label Sep 29, 2022
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Oct 13, 2022
@PaddlePaddle PaddlePaddle unlocked this conversation Oct 13, 2022
@@ -14,212 +14,295 @@

#include "paddle/phi/kernels/arg_min_max_kernel.h"

#include "paddle/fluid/platform/device/gpu/gpu_launch_config.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用phi目录下的文件,这些头文件应该都是有的

};

template <>
struct SharedMemory<float> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个结构体是必要的吗?为什么不使用extern __shared__ T

}

template <typename Context, typename T, typename IndType, typename CompOp>
void ArgCUDAImpl(const Context& dev_ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数命名最好还是能够清晰释义

@ZzSean
Copy link
Contributor

ZzSean commented Oct 13, 2022

除此以外需要先解决下CI问题

@luotao1 luotao1 mentioned this pull request Aug 10, 2023
@paddle-bot paddle-bot bot closed this Oct 17, 2023
@paddle-bot
Copy link

paddle-bot bot commented Oct 17, 2023

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants