-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 3 No.37】为 Paddle 优化 argmin_argmax op 在 GPU 上的计算性能 #46655
Conversation
@@ -14,212 +14,295 @@ | |||
|
|||
#include "paddle/phi/kernels/arg_min_max_kernel.h" | |||
|
|||
#include "paddle/fluid/platform/device/gpu/gpu_launch_config.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用phi目录下的文件,这些头文件应该都是有的
}; | ||
|
||
template <> | ||
struct SharedMemory<float> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个结构体是必要的吗?为什么不使用extern __shared__ T
} | ||
|
||
template <typename Context, typename T, typename IndType, typename CompOp> | ||
void ArgCUDAImpl(const Context& dev_ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
函数命名最好还是能够清晰释义
除此以外需要先解决下CI问题 |
Since you haven't replied for more than a year, we have closed this issue/pr. |
PR types
Performance optimization
PR changes
OPs
Describe
目前 Paddle 内 argmax\argmin 算子的 GPU 实现采用了 Cub 库实现,可以用 Reduce 替换,Reduce 模块的性能需进一步提升。
设计文档: PaddlePaddle/community#256
对kps::Reduce进行改写,支持索引返回
完成优化后,Paddle与优化前的Paddle的性能对比效果:
完成优化后,Paddle与Pytorch的性能对比效果如下: