Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve the performence of divide_double_grad #62533

Merged
33 changes: 11 additions & 22 deletions paddle/phi/kernels/impl/elementwise_grad_kernel_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,17 @@ void DivideDoubleGradKernel(const Context& dev_ctx,
auto* ddx_tensor = ddx.get_ptr();
auto* ddy_tensor = ddy.get_ptr();
auto* dx_tensor = dx.get_ptr();
DenseTensor dz_div_y;
dz_div_y.Resize(out.dims());
if (dx_tensor == nullptr || dx_tensor->dims() != out.dims()) {
dev_ctx.template Alloc<T>(&dz_div_y);
funcs::DefaultElementwiseOperator<Context,
T,
funcs::DivideFunctor<T>,
funcs::InverseDivideFunctor<T>>(
dev_ctx, grad_out, y, &dz_div_y, axis);
dx_tensor = &dz_div_y;
}
// ddOut = ddX / Y - Out * ddY / Y = (ddX - Out * ddY) / Y
// dY = Out * dX * ddY / Y - dX * ddX / Y
// dOut = - dX * ddY
Expand All @@ -195,17 +206,6 @@ void DivideDoubleGradKernel(const Context& dev_ctx,
if (ddx_tensor == nullptr && ddy_tensor == nullptr) {
dy = nullptr;
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 如果推导右侧公式存在None导致左侧变量无法计算,根据现在paddle的方案,应该用全0填充:FullLikeKernel<T, Context>(dev_ctx, y, Scalar(0.0), y.dtype(), dy);,否则大量代码需要修改。
  2. 把指针本身设置为空的行为应该是没有意义的

其他地方问题类似

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
if (dx_tensor == nullptr || dx_tensor->dims() != out.dims()) {
DenseTensor dz_div_y;
dz_div_y.Resize(out.dims());
dev_ctx.template Alloc<T>(&dz_div_y);
funcs::DefaultElementwiseOperator<Context,
T,
funcs::DivideFunctor<T>,
funcs::InverseDivideFunctor<T>>(
dev_ctx, grad_out, y, &dz_div_y, axis);
dx_tensor = &dz_div_y;
}
DenseTensor tmp_dy = tmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

209行感觉可以删掉吧,反正tmp_dy也是语义不明,还不如直接用tmp,删掉之后下面的tmp_dy全部改成tmp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// dX / Y
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// dX / Y ==> // pre-compute 'dX / Y' into 'tmp' for 'ddout' and/or 'dy'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

funcs::DefaultElementwiseOperator<Context,
Expand Down Expand Up @@ -312,17 +312,6 @@ void DivideDoubleGradKernel(const Context& dev_ctx,
if (ddy_tensor == nullptr) {
dout = nullptr;
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 让传值的指针等于 nullptr 这个操作,应该是没意义的,进而可以把这里的if-else判断优化下
  2. dy应该赋值为形状跟y一样的全0矩阵

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
if (dx_tensor == nullptr || dx_tensor->dims() != out.dims()) {
DenseTensor dz_div_y;
dz_div_y.Resize(out.dims());
dev_ctx.template Alloc<T>(&dz_div_y);
funcs::DefaultElementwiseOperator<Context,
T,
funcs::DivideFunctor<T>,
funcs::InverseDivideFunctor<T>>(
dev_ctx, grad_out, y, &dz_div_y, axis);
dx_tensor = &dz_div_y;
}
// dOut = - dX * ddY
funcs::DefaultElementwiseOperator<Context,
T,
Expand Down