Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Optimize isfinite kernel #69596

Merged

Conversation

HydrogenSulfate
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate commented Nov 21, 2024

PR Category

Performance Optimization

PR Types

Improvements

Description

Pcard-75624

p_norm_grad组合算子中使用了isfinite基础算子

auto _zero_tensor =
full<T>(common::vectorize(x.dims()), 0.0, x.dtype(), x.place());
auto finite_mask = isfinite<T>(x_grad_tmp);
x_grad_tmp = where<T>(finite_mask, x_grad_tmp, _zero_tensor);
x_grad_tmp = expand_out_grad * (x_grad_tmp);

isfintie的kernel实现使用了thrust库,这个库的API调用时会触发CudaStreamSynchronize,最终导致每个step的耗时增加(下图红圈),因此参考 isclose kernel 重构了 isinite kernel。

Important

重构后的代码结构如下四部分组成(以isfinite为例)

  1. 通用模板声明
  2. 整数类型的偏特化,由于整数不会出现inf或nan,不需要判断直接赋值true或false即可
  3. 标准浮点数类型的偏特化,根据device类型,调用cuda或std提供的判断函数
  4. 其他自定义浮点类型的特化,根据device类型,调用cuda或phi提供的判断函数

image
image

修复后,平均耗时(ns): 659408.8 下降至 34268.3,耗时减少为可忽略状态,timeline也没有再出现绿块

image
image

Copy link

paddle-bot bot commented Nov 21, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@HydrogenSulfate HydrogenSulfate merged commit dc6bba9 into PaddlePaddle:develop Nov 25, 2024
27 of 28 checks passed
@HydrogenSulfate HydrogenSulfate deleted the optimize_isfinite branch November 25, 2024 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants