[Kernel] Optimize isfinite kernel #69596
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Performance Optimization
PR Types
Improvements
Description
Pcard-75624
p_norm_grad
组合算子中使用了isfinite
基础算子Paddle/paddle/fluid/prim/api/composite_backward/composite_backward_api.h
Lines 2077 to 2082 in 7ca7f2c
而
isfintie
的kernel实现使用了thrust库,这个库的API调用时会触发CudaStreamSynchronize,最终导致每个step的耗时增加(下图红圈),因此参考 isclose kernel 重构了 isinite kernel。Important
重构后的代码结构如下四部分组成(以
isfinite
为例)修复后,平均耗时(ns): 659408.8 下降至 34268.3,耗时减少为可忽略状态,timeline也没有再出现绿块