Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix paddle.mode and paddle.bincount API #63970

Merged
merged 7 commits into from
May 9, 2024

Conversation

xingmingyyj
Copy link
Contributor

PR Category

Others

PR Types

Bug fixes

Description

paddle.mode和paddle.bincount两个API在静态图模式下组网执行时,出现精度问题。经过分析原因和 #62801 所遇到的问题一致,根据kernel中的数据类型进行修复。

Copy link

paddle-bot bot commented Apr 29, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 29, 2024
@xingmingyyj
Copy link
Contributor Author

补充说明bincount报错信息:
下面动转静代码执行时:

......
paddle.seed(33)
obj = naive_func
dy_out = obj(in_tensor, in_params, func)

paddle.seed(33)
jit_obj = paddle.jit.to_static(obj)
st_out = jit_obj(in_tensor, in_params, func)
print("dy_out is: ", dy_out)
print("st_out is: ", st_out)

paddle.jit.save(jit_obj, path="bincount")
print("jit.save is successfully !!!")

paddle.seed(33)
jit = paddle.jit.load("bincount")
print("jit.load is successfully !!!")

paddle.seed(33)
inputs_key = sorted(in_tensor.keys())
inputs_value = []
for k in inputs_key:
    inputs_value.append(in_tensor[k])
# print('inputs_value is: ', inputs_value)
res = jit(*inputs_value)
print('jit.load res: ', res)

compare(dy_out, res, delta=1e-5, rtol=1e-6)

报错如下:

Traceback (most recent call last):
  File "/home/aistudio/fix_op/Paddle/tools/fix_bitcount.py", line 106, in <module>
    res = jit(*inputs_value)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/jit/translated_layer.py", line 1475, in __i_m_p_l__
    return _run_dygraph(self, input, program_holder)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/jit/translated_layer.py", line 1002, in _run_dygraph
    _legacy_C_ops.run_program(
ValueError: In user code:


    InvalidArgumentError: The type of data we are trying to retrieve (int32) does not match the type of data (int64) currently contained in the container.
      [Hint: Expected dtype() == phi::CppTypeToDataType<T>::Type(), but received dtype():9 != phi::CppTypeToDataType<T>::Type():7.] (at /home/aistudio/fix_op/Paddle/paddle/phi/core/dense_tensor.cc:161)
      [operator < pd_kernel.phi_kernel > error]  [operator < run_program > error]

这里可以发现在scale这算子中,张量的实际数据类型和目前期望的数据类型不一致。
执行器执行的计算图如下:

{
    (%0) = "data(phi_kernel)" () {dtype:(pd_op.DataType)bool,is_persistable:[false],kernel_key:<backend:GPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"data",name:"_jst.0.a.0",op_name:"pd_op.data",place:(pd_op.Place)Place(gpu:0),shape:(pd_op.IntArray)[],stop_gradient:[false]} : () -> gpu_tensor<10xi32>
    (%1) = "full(phi_kernel)" () {dtype:(pd_op.DataType)int32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)0} : () -> cpu_tensor<1xi32>
    (%2) = "bincount(phi_kernel)" (%0, <<NULL VALUE>>, %1) {is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:int32>,kernel_name:"bincount",op_name:"pd_op.bincount",stop_gradient:[false]} : (gpu_tensor<10xi32>, <<NULL TYPE>>, cpu_tensor<1xi32>) -> gpu_tensor<-1xi32>
    (%3) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%4) = "scale(phi_kernel)" (%2, %3) {bias:(Float)0,bias_after_scale:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[false]} : (gpu_tensor<-1xi32>, cpu_tensor<1xf32>) -> gpu_tensor<-1xi32>
    () = "builtin.shadow_output" (%4) {output_name:"translated_layer/scale_0.tmp_0"} : (gpu_tensor<-1xi32>) -> 
}

猜测时infermeta中的dtype设置问题导致的。这里weight为空,x.dtype为int32,所以被设置为了int32类型,和kernel中的下述逻辑不符。

  if (!has_weights) {
    int64_t* output_data = dev_ctx.template Alloc<int64_t>(output);
    phi::funcs::SetConstant<Context, int64_t>()(
        dev_ctx, output, static_cast<int64_t>(0));

    KernelBincount<T, InputT, int64_t>
        <<<GET_BLOCKS(input_numel), PADDLE_CUDA_NUM_THREADS, 0, stream>>>(
            input_data, input_numel, has_weights, weights_data, output_data);
  }

kangguangli
kangguangli previously approved these changes Apr 30, 2024
@xingmingyyj xingmingyyj closed this May 7, 2024
@xingmingyyj xingmingyyj reopened this May 7, 2024
@xingmingyyj xingmingyyj requested a review from kangguangli May 8, 2024 11:57
@kangguangli kangguangli merged commit 0f41ea7 into PaddlePaddle:develop May 9, 2024
31 checks passed
@xingmingyyj xingmingyyj deleted the fix_mode_bincount branch May 9, 2024 11:57
co63oc pushed a commit to co63oc/Paddle that referenced this pull request May 10, 2024
* fix_infermeta

* Update binary.cc

* Update binary.cc

* Update binary.cc

* Update binary.cc

* Update binary.cc

* ci
co63oc pushed a commit to co63oc/Paddle that referenced this pull request May 11, 2024
* fix_infermeta

* Update binary.cc

* Update binary.cc

* Update binary.cc

* Update binary.cc

* Update binary.cc

* ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants