Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Support npu op reciprocal and reciprocal grad #34531

Merged
merged 1 commit into from
Aug 3, 2021

Conversation

limin2021
Copy link
Contributor

@limin2021 limin2021 commented Aug 2, 2021

PR types

New features

PR changes

OPs

Describe

[NPU] Support npu op reciprocal and reciprocal grad

说明:

  1. 数据类型支持:
    NPU端支持的数据类型与CPU端的保持一致:float, float16, double。
  2. 单测测试用例测试了fp64,fp32和fp16的情形。
    测试fp64类型时,报了如下错误:
    ExternalError: ACL error, the error code is : 500002. (at /workspace/limin-workspace/npu-paddle/Paddle/paddle/fluid/operators/npu_op_runner.cc:380) 1226: [operator < mean > error]。
    原因:reciprocal的单测(反向)会自动添加mean op,但是mean的前向ReduceMeanD不支持double和int,注册mean kernel时却支持了fp64和int类型。
    解决:将mean_op_npu.cc中对double和int类型的支持去掉。

运行结果

  • 单测运行结果
    6e3a0253ef2a59eb6d9cf6a70597aa5d

  • 调用 reciprocal npu kernel 和 reciprocal grad npu kernel

fp32:
48053fcbbda8e22804b700b6eba40449

fp64:
2e35004355bb4305d4c906ad4a6a9ba6

fp16:
5de13e2e7c4ed7e0b0eb5013d5ba1462

@paddle-bot-old
Copy link

paddle-bot-old bot commented Aug 2, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

import paddle
import paddle.fluid as fluid

paddle.enable_static()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 如果不是用API组网,静态图执行的方式写单测,这个enable_static应该是没有必要的
  • 上面的import,看看没有必要的也可以删掉了

Copy link
Contributor Author

@limin2021 limin2021 Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 此处参考了npu pow op的单测写法,使用的是静态图的写法.
  2. 无用的import已remove.

return
self.check_grad_with_place(
self.place, ['X'], 'Out', max_relative_error=0.01)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP16类型不用check_grad吗?这个NPU算子开发要求里有没有明确提到

Copy link
Contributor Author

@limin2021 limin2021 Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有明确提到:参见NPU开发算子文档:check_grad_with_place检查反向梯度时,不支持对float16数据类型的检查,不用写相关单测。

self.dtype = np.float32

def init_kernel_type(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_kernel_type这个方法在哪里用到了吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();

const auto& runner = NpuOpRunner("Reciprocal", {*x}, {*out}, {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面每一行之间的空行建议删除,下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def set_npu(self):
self.__class__.use_npu = True
self.place = paddle.NPUPlace(0)
self.__class__.no_need_check_grad = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果fp16类型确定不需要梯度检查,需要使用skip_check_grad_ci装饰器,并描述原因(可以在源码里搜索下参考下别的单测)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ops::ReciprocalGradNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::ReciprocalGradNPUKernel<paddle::platform::NPUDeviceContext, int64_t>,
ops::ReciprocalGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看PR描述里提到 complex64, complex12是可以支持的,但这里没有注册,需要确认下开发要求

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已确定开发需求(@彭宇琪):npu上支持的数据类型与cpu上保持一致即可。故最终调整支持的数据类型为float, double和float16.

def init_dtype(self):
self.dtype = np.float16


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议增加对double数据类型的单测,C++算子支持的数据类型要尽量在单测中覆盖

Copy link
Contributor Author

@limin2021 limin2021 Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
测试fp64类型时,报了如下错误:
ExternalError: ACL error, the error code is : 500002. (at /workspace/limin-workspace/npu-paddle/Paddle/paddle/fluid/operators/npu_op_runner.cc:380) 1226: [operator < mean > error]。
原因:reciprocal的单测(反向)会自动添加mean op,但是mean的前向ReduceMeanD不支持double和int,注册mean npu kernel时却支持了fp64和int类型。
解决:将mean_op_npu.cc中对double和int类型的支持去掉。

@limin2021 limin2021 force-pushed the add_npu_reciprocal branch from 541df3d to 17912cf Compare August 3, 2021 04:15
Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhangting2020 zhangting2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qili93 qili93 merged commit d7493df into PaddlePaddle:develop Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants