Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4】:为maxout算子支持 float16 数据类型 #50976

Merged
merged 12 commits into from
Apr 27, 2023
2 changes: 2 additions & 0 deletions paddle/phi/kernels/funcs/maxouting.cu
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,11 @@ void MaxOutGradFunctor<DeviceContext, T>::operator()(
}

template class MaxOutGradFunctor<phi::GPUContext, float>;
template class MaxOutGradFunctor<phi::GPUContext, phi::dtype::float16>;
template class MaxOutGradFunctor<phi::GPUContext, double>;

template class MaxOutFunctor<phi::GPUContext, float>;
template class MaxOutFunctor<phi::GPUContext, phi::dtype::float16>;
template class MaxOutFunctor<phi::GPUContext, double>;

} // namespace funcs
Expand Down
9 changes: 7 additions & 2 deletions paddle/phi/kernels/gpu/maxout_grad_kernel.cu
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,10 @@
#include "paddle/phi/core/kernel_registry.h"
#include "paddle/phi/kernels/impl/maxout_grad_kernel_impl.h"

PD_REGISTER_KERNEL(
maxout_grad, GPU, ALL_LAYOUT, phi::MaxOutGradKernel, float, double) {}
PD_REGISTER_KERNEL(maxout_grad,
GPU,
ALL_LAYOUT,
phi::MaxOutGradKernel,
float,
phi::dtype::float16,
double) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

反向kernel可能也需要调整为FP32计算精度,已降低精度的损失。

Copy link
Contributor Author

@Patrick-Star125 Patrick-Star125 Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.意思是直接去掉phi::dtype::float16吗?这样做测试反向算子似乎会出错
2.请问如何判断是否会导致精度损失过大,能否改进计算逻辑减少损失

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前的修改只是给算子注册了fp16类型,但是看你并没有对kernel的实现做修改。
需要分析下前、反向的计算,里面的一些计算过程在fp16下是否会损失精度。单测因为运行时间的限制设置的shape都比较小,在自己开发环境上可以尝试把shape调大到比如1000+以上的数据规模,再看看单测里这几个fp16的case精度检查是否能达标呢?

关于问题2,在官网文档中都有详细介绍。https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/amp_precision/amp_op_dev_guide_cn.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

理解了,已经将fp16单测与fp32单测对齐,测试方式和误差要求一致
1.maxout函数的逻辑为对tensor按指定组大小遍历取最大值,只有比较操作,不涉及计算,对于MaxOutFunctor和MaxOutGradFunctor的参数input_tensor的处理和output_tensor的计算都不含有规约计算,无溢出风险。
2.在线下的测试中我尝试了[32, 12, 128, 128]、[320, 12, 128, 128]、[320, 120, 128, 128]形式均可以通过,更大的tensor因为设备显存不足暂时无法测试,,但应该精度可以达标。

8 changes: 7 additions & 1 deletion paddle/phi/kernels/gpu/maxout_kernel.cu
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,10 @@
#include "paddle/phi/core/kernel_registry.h"
#include "paddle/phi/kernels/impl/maxout_kernel_impl.h"

PD_REGISTER_KERNEL(maxout, GPU, ALL_LAYOUT, phi::MaxOutKernel, float, double) {}
PD_REGISTER_KERNEL(maxout,
GPU,
ALL_LAYOUT,
phi::MaxOutKernel,
float,
phi::dtype::float16,
double) {}
35 changes: 35 additions & 0 deletions python/paddle/fluid/tests/unittests/test_maxout_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,5 +136,40 @@ def test_errors(self):
self.assertRaises(ValueError, F.maxout, x_float32, 2, 2)


class TestMaxOutOpFP16(TestMaxOutOp):
def set_attrs(self):
self.dtype = 'float16'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的FP16单测可以继承TestMaxOutOp,对TestMaxOutOp做一些小的改动,比如支持设置dtype,shape,attrs,这样可以简化代码。

可以参考低精度单测规范中的介绍。https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/amp_precision/amp_test_dev_guide_cn.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


class TestMaxoutFP16Case1(TestMaxOutOpFP16):
def set_attrs(self):
self.axis = -1


class TestMaxoutFP16Case2(TestMaxOutOpFP16):
def set_attrs(self):
self.axis = 3


@unittest.skipIf(
not core.is_compiled_with_cuda(), "core is not compiled with CUDA"
)
class TestMaxoutStaticAPIFP16(unittest.TestCase):
def setUp(self):
self.x_np = np.random.uniform(-1, 1, [2, 6, 5, 4]).astype(np.float16)
self.groups = 2
self.axis = 1
self.place = paddle.CUDAPlace(0)

def test_static_api(self):
with paddle.static.program_guard(paddle.static.Program()):
x = paddle.static.data('X', self.x_np.shape, self.x_np.dtype)
out = F.maxout(x, self.groups, self.axis)
exe = paddle.static.Executor(self.place)
res = exe.run(feed={'X': self.x_np}, fetch_list=[out])
out_ref = maxout_forward_naive(self.x_np, self.groups, self.axis)
np.testing.assert_allclose(out_ref, res[0], rtol=1e-05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不推荐使用fluid的api。可以参考#50832中的PR静态图的单测写法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,原本的测试也用了fluid.data,需要一并修改吗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以不修改



if __name__ == '__main__':
unittest.main()
6 changes: 4 additions & 2 deletions python/paddle/nn/functional/activation.py
Original file line number Diff line number Diff line change
Expand Up @@ -784,7 +784,7 @@ def maxout(x, groups, axis=1, name=None):

Parameters:
x (Tensor): The input is 4-D Tensor with shape [N, C, H, W] or [N, H, W, C], the data type
of input is float32 or float64.
of input is float16, float32 or float64.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个API实现中,有动静态图2个分支。静态图分支能否正常运行?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已增加静态图分支的测试

groups (int): The groups number of maxout. `groups` specifies the
index of channel dimension where maxout will be performed. This must be
a factor of number of features.
Expand Down Expand Up @@ -819,7 +819,9 @@ def maxout(x, groups, axis=1, name=None):
if in_dygraph_mode():
return _C_ops.maxout(x, groups, axis)
else:
check_variable_and_dtype(x, 'x', ['float32', 'float64'], 'maxout')
check_variable_and_dtype(
x, 'x', ['float16', 'float32', 'float64'], 'maxout'
)
if axis not in [1, -1, 3]:
raise ValueError(
"Attr(axis) should be 1 when data format is NCHW, -1 or 3 when data format is NHWC. Received "
Expand Down