[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46333

OccupyMars2025 · 2022-09-20T18:47:06Z

PR types

New features

PR changes

OPs

Describe

[used AI Studio] add paddle.incubate.sparse.reshape

本PR的前身是 #46242 ，本PR是删除了与第22号任务无关的内容后的精简版

本PR是在 #45849 基础上修改得到，而且 sparse reshape 和 sparse transpose 有相似性

paddle-bot · 2022-09-20T18:47:09Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

OccupyMars2025 · 2022-09-21T01:48:31Z

"Floating-point exception"， so change the dtype in all test cases to int64 to check whether the computation logic is right

OccupyMars2025 · 2022-09-21T09:38:23Z

After changing the dtype in test cases to int64, the test cases still report "Floating-point exception", it is strange

Check if the computation of ReshapeCooGradKernel is right

Integer divide-by-zero

OccupyMars2025 · 2022-09-22T02:28:52Z

It seems that the example code in the docstring of python API will also be used in CI test

It seems that CI test will check your comment content

OccupyMars2025 · 2022-09-22T03:17:05Z

// /* Caution : 这是原来的计算逻辑，我认为是错误的，
// 这里计算逻辑是：原tensor的shape是 (10, 20, 30, 40, 50)
// 一个非零元素的索引为 (1, 2, 3, 4, 5)
// 进行transpose 后, tensor的shape 是 (30, 10, 50, 20, 40)
// 这里的计算逻辑就认为该非零元素的新索引就是 (3, 1, 5, 2, 4)
// 没错，这就是transpose的计算逻辑，transpose后元素在内存中的位置改变了
// 你更改的逻辑其实是 reshape的计算逻辑，reshape后所有元素在内存中的位置均不变

OccupyMars2025 · 2022-09-22T09:31:36Z

Delete grad kernel test case to check whether the forward kernel is correct

OccupyMars2025 · 2022-09-22T13:19:43Z

failed CI，@zhouwei25 之前的CI报错，我进行了一些修改，但是现在的CI报错，我有点看不懂了，能否请飞桨专家给一些方向性的意见。在python\paddle\fluid\tests\unittests\test_sparse_reshape_op.py 里面，我已经取消了backward检测，只保留了一个测试用例。

zhwesky2010

目前CI能跑通吗

zhwesky2010 · 2022-09-22T13:55:50Z

paddle/phi/api/yaml/sparse_ops.yaml

+  args : (Tensor x, int64_t[] new_shape)
+  output : Tensor(out)
+  infer_meta :
+    func : sparse::ReshapeInferMeta


这个可以复用Dense的ReshapeInferMeta，不用新增一个

zhwesky2010 · 2022-09-22T13:56:38Z

paddle/phi/kernels/sparse/cpu/reshape_kernel.cc

+                        const std::vector<int64_t>& new_shape,
+                        SparseCooTensor* out) {
+   /*
+   目前只能针对 sparse part dims 部分进行reshape


使用英文注释吧

zhwesky2010 · 2022-09-22T13:57:38Z

python/paddle/incubate/sparse/unary.py

@@ -608,3 +608,26 @@ def expm1(x, name=None):
            out = paddle.incubate.sparse.expm1(sparse_x)
    """
    return _C_ops.sparse_expm1(x)
+
+@dygraph_only
+def reshape(x, new_shape, name=None):


API参数名和dense一致，用shape，另外yaml、c++ kernel传递到底层都用这个

zhwesky2010 · 2022-09-22T14:03:19Z

paddle/phi/kernels/sparse/cpu/reshape_grad_kernel.cc

+                            // const std::vector<int>& perm,
+                            SparseCooTensor* dx) {
+  EmptyLikeCooKernel<T, Context>(dev_ctx, x, dx);
+  std::vector<int64_t> x_shape(x.dims().size());


直接用phi::vectorize函数，将ddim转成vector吧，不用这么复杂

zhwesky2010 · 2022-09-22T14:04:26Z

paddle/phi/kernels/sparse/gpu/reshape_kernel.cu

+                                       const int64_t* out_sparse_part_strides,
+                                       int64_t *out_indices_data) {
+
+  // for (std::size_t i = 0; i < n_dim; ++i) {


删除注释

zhwesky2010 · 2022-09-22T14:09:13Z

paddle/phi/kernels/sparse/cpu/reshape_kernel.cc

+  for (int i = 0; i < x.sparse_dim(); ++i) {
+    x_sparse_part_dims.push_back(x.dims()[i]);
+  }
+  for (int i = 0; i < out_dims.size() - x.dense_dim(); ++i) {


这里是new_shape的尾部去除x的dense_dim才是新的shape？如果仅针对sparse_dim部分reshape话，还是在API文档里说明，然后new_shape仅针对sparse_dim生效

done。对，只针对sparse_dim生效，但new_shape 必须指定为完整的shape，比如一个 sparse coo 格式的tensor, shape 为 (2, 3, 4, 5, 6), 其中 (2, 3, 4)是 sparse_dim，（5，6）是 dense_dim，我想把 sparse_dim 部分reshape 为 (3, 8), 则new_shape必须指定为 (3, 8, 5, 6) ，不能指定为 (3, 8)

zhwesky2010 · 2022-09-22T14:10:02Z

python/paddle/fluid/tests/unittests/test_sparse_reshape_op.py

+class TestReshape(unittest.TestCase):
+    # x: sparse, out: sparse
+    def check_result(self, x_shape, new_shape, format):
+        with _test_eager_guard():


这里可以删除了

zhwesky2010 · 2022-09-22T14:10:18Z

python/paddle/fluid/tests/unittests/test_sparse_reshape_op.py

+
+    def test_reshape_2d(self):
+        self.check_result([2, 5], [10,], 'coo')
+    #     self.check_result([10, 5], [2, 25], 'csr')


这里单测能跑通吗

不能跑通

OccupyMars2025 · 2022-09-22T15:26:42Z

感谢解答，就是这个唯一的单侧都不能跑通，而且报错信息，我有点看不懂，尤其是下面这两个报错信息：

OccupyMars2025 · 2022-09-22T15:35:07Z

我先按照你的意见修改完后，再跑一遍CI，看是否还是报这两个错误

OccupyMars2025 · 2022-09-23T08:00:58Z

根据飞桨专家的意见修改后，还是有一样的错误，如下：

在nvidia官网上搜索得到如下解释：

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html

猜测可能的报错原因：forward coo cuda kernel 中用int64_t * 指针给indices的 DenseTensor的元素赋值时，超出范围？？？

OccupyMars2025 · 2022-09-23T10:10:09Z

似乎是 `sp_out.to_dense().numpy()` 导致了报错，而 `sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)` 似乎能计算，测试一下

OccupyMars2025 · 2022-09-23T12:17:56Z

How to fix it ?

The Chinese comment may be the cause of the error. So I translate the Chinese comment into English

https://www.cnblogs.com/VVingerfly/p/13751289.html

OccupyMars2025 · 2022-09-23T14:08:56Z

but in aistudio, dense tensor has no problems, so maybe the cause is that paddle.reshape and paddle.incubate.sparse.reshape are operating on the same paddle tensor.

How to fix it ? my solution is to use numpy to generate equal but different paddle tensor: paddle.to_tensor(np_x)

OccupyMars2025 · 2022-09-23T23:55:15Z

You need to add `if paddle.is_compiled_with_cuda():`

OccupyMars2025 · 2022-09-24T02:20:29Z

It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually `sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)` causes the error in which `sp_x` is a cuda sparse tensor, you should check your gpu forward coo kernel of sparse reshape

```python

        # dense_x = paddle.clone(origin_x.detach())
        mask = np.random.randint(0, 2, x_shape)
        np_x = np.random.randint(-100, 100, x_shape) * mask 
        
        
        ### cpu version
        dense_x = paddle.to_tensor(np_x, place=paddle.CPUPlace())
        dense_x.numpy()
        print(dense_x.numpy())
        dense_x.stop_gradient = False
        dense_x.numpy()
        # dense_out = paddle.transpose(dense_x, dims)
        dense_out = paddle.reshape(dense_x, new_shape)
        dense_out.numpy()
        print(dense_out.numpy())

        if format == "coo":
            # sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
            sp_x = paddle.to_tensor(np_x, place=paddle.CPUPlace()).to_sparse_coo(len(x_shape))
        else:
            # sp_x = origin_x.detach().to_sparse_csr()
            sp_x = paddle.to_tensor(np_x, place=paddle.CPUPlace()).to_sparse_csr()
        sp_x.stop_gradient = False
        # sp_out = paddle.incubate.sparse.transpose(sp_x, dims)
        sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)

        print(10*'=', "OccupyMars2025 the following is dense_out", 10*'=')
        print("dense_out.numpy():", dense_out.numpy())
        print("dense_out:", dense_out)
        print(10*'=', "OccupyMars2025 the following is sp_out", 10*'=')
        print("sp_out:", sp_out)
        print("sp_out.to_dense():", sp_out.to_dense())
        print("sp_out.to_dense().numpy():", sp_out.to_dense().numpy())
        print(10*'=', "OccupyMars2025 the end", 10*'=')
        
        np.testing.assert_allclose(sp_out.to_dense().numpy(),
                                dense_out.numpy(),
                                rtol=1e-05)

        if paddle.is_compiled_with_cuda():
            ## cuda version
            dense_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0))
            dense_x.numpy()
            print(dense_x.numpy())
            dense_x.stop_gradient = False
            dense_x.numpy()
            # dense_out = paddle.transpose(dense_x, dims)
            dense_out = paddle.reshape(dense_x, new_shape)
            dense_out.numpy()
            print(dense_out.numpy())

            if format == "coo":
                # sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
                sp_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0)).to_sparse_coo(len(x_shape))
            else:
                # sp_x = origin_x.detach().to_sparse_csr()
                sp_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0)).to_sparse_csr()
            sp_x.stop_gradient = False
            # sp_out = paddle.incubate.sparse.transpose(sp_x, dims)
            sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)

            print(10*'=', "OccupyMars2025 the following is dense_out", 10*'=')
            print("dense_out.numpy():", dense_out.numpy())      #report error at this line

OccupyMars2025 · 2022-09-24T02:56:11Z

The following picture shows that at least the forward coo kernel of sparse reshape on cpu can works in a right way.

OccupyMars2025 · 2022-09-24T05:54:41Z

this is the reason for `dense_x.grad.numpy() * mask`

            dense_out.backward()
            sp_out.backward()
            np.testing.assert_allclose(sp_x.grad.to_dense().numpy(),
                                       dense_x.grad.numpy() * mask,
                                   #    dense_x.grad.numpy(),
                                       rtol=1e-05)

OccupyMars2025 · 2022-09-24T06:21:52Z

There seems to be numerical unstability when doing backward computation on cpu. Run the test case multiple times. Then sometimes the two grad tensors have same values and sometimes the two grad tensors have different values.

OccupyMars2025 · 2022-09-24T13:02:51Z

All test cases for cpu forward coo kernel are successful, so my forward computation logic is correct, but gpu forward coo kernel doesn't work.

OccupyMars2025 · 2022-09-25T01:48:12Z

飞桨高可复用算子库 PHI 设计文档提到的需要注意的要点：

判断是否要进行跨设备数据拷贝

按训练和推理场景拆分编译
例如：推理不编译反向相关 kernel，也不编译带有 Intermediate 输出的前向 kernel

长线上支持跨设备 kernel 的写法统一需求，并且直观易用，不引入不必要的模板参数

解释：算子库下层还有 Kernel Primitive API 模块，其长线愿景是每个运算，只用一个 kernel，就能够适应多种设备，真正区分设备的代码，仅在 Kernel Primitive API 实现中；未来复用 kernel 传入较复杂的模板参数时，需要限制参数尽可能地简洁

For Tensor, ALL represents an illegal Backend, but for Kernel, some
kernels may be device-independent by nature, such as reshape;
and some kernels are also device-independent when implemented based on
primitive API.

这里什么不使用原先 fluid 的 VarType？
理由 1：原先 fluid 的 DataType 和 VarType 是同级概念，设计是比较混乱的，例如 LoDTensor 和 FLOAT32 是同级概念，但这两者显然不是的，我们不希望继承原先有明显缺陷的设计
理由 2：和 fluid 解耦依赖，便于后续 PHI 可以独立编译

Scalar (标量) 用来统一表示具有不同基础数据类型(float, double, int, bool 等)的变量。（目前也支持表示元素数量为 1 的 Tensor 标量，但后续可能会放弃该功能的支持）

以ScaleKernel为例，其中的scale参数可以传入 int，float，double 等普通数据类型。如果不使用Scalar来表示的话，需要为每种数据类型单独创建一个函数接口，这样会大大增加开发 Kernel 的代码量，因此Scalar主要应用在具有不同数据类型的同一参数上，可以避免该场景下需要编写多个重载函数的问题。

当一个 Tensor 赋值给另一个 Tensor，或者 Tensor 作为函数返回值时，实际上只会拷贝指针，不会产生真实的数据拷贝

编译解耦：

这里带有的 autograd 信息，只是一个指针索引，默认为空
std::unique_ptr autograd_meta_ = nullptr;
而这里的 AbstractAutogradMeta 是一个抽象类接口，不会依赖 autograd 的任何模块，因此不会影响 PHI 的独立编译，同时又兼顾了动态图 Tensor 需要持有反向信息的需求。
这里的 AutogradMeta 仅在动态图场景中才会设置，不需要的场景，比如静态图内就仅仅是个空指针而已

DenseTensor 对应原 fluid 内的 LoDTensor 类，是 Tensor 的基础实现，DenseTensor 内的 DenseTensorMeta 包含描述 Tensor 信息的基础成员，DenseTensor 内的 Allocation 就是 fluid 原有的 Allocation

zhwesky2010

你之前发的报错是两个问题：

CUDA显存访问越界了，可能是分配的不够，这个要仔细检查下标
编译问题

OccupyMars2025 · 2022-09-28T04:40:54Z

编译问题，按照我的理解，我应该是解决了
我再研究一下CUDA kernel的问题

OccupyMars2025 · 2022-09-29T02:36:35Z

Maybe the error "illegal memory access" is caused by "not enough memory is allocated", so I change the configuration of the cuda kernel, change "0" to a large value "1024", but this is probably not the actual cause of the error.

OccupyMars2025 · 2022-09-29T06:17:50Z

[Bug from the PaddlePaddle framework] When I run the following code, I will get the following error message

import paddle
sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
sp_x.numpy()   # this line reports an error

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1664432194 (unix time) try "date -d @1664432194" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 11542 (TID 0x7f1f16333740) from PID 0 ***]

Segmentation fault (core dumped)
aistudio@jupyter-242560-4589773:~/work$

I install the PaddlePaddle wheel package which is built from the branch OccupyMars2025:hackathon-3rd-task22-add-paddle.incubate.sparse.reshape-version002, but because the reported error has nothing to do with my newly added code, so I think maybe there is some bug in the PaddlePaddle framework itself. I use the following environment to compile the source code. And I use this command to compile: `cmake .. -DPY_VERSION=3.8 -DWITH_GPU=ON`

I'm rebuilding from the source code on the branch develop to check if the bug is actually from the PaddlePaddle framework itself. (2022/9/29 15:50). At 2022/9/29 17:06, I finished building from the source code on the branch "develop", and the above code reports the same error , so I'm certain that there is a bug in the source code on the branch "develop" which had been the cause of my failed CI.

I'm wrong!!!! The code `sp_x.numpy()` is illegal itself. I should use `sp_x.to_dense().numpy()`

>>> import paddle
>>> sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
W0929 09:40:46.203522 31427 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0929 09:40:46.206414 31427 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
>>> sp_x.to_dense()
Tensor(shape=[1], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       [3])
>>> sp_x.to_dense().numpy()
array([3])
>>>

OccupyMars2025 · 2022-09-29T10:02:30Z

Now, I'm certain that cpu coo forward kernel is ok.
But there are two possible errors:

cuda coo forward kernel
cpu coo backward kernel

2022-09-29 15:52:35        [1, 0, 1, 1, 0]])
2022-09-29 15:52:35  y: array([[0, 1, 1, 1, 0],
2022-09-29 15:52:35        [1, 0, 1, 1, 0]])
2022-09-29 15:52:35 ----------------------------------------------------------------------
2022-09-29 15:52:35 Ran 1 test in 0.005s
2022-09-29 15:52:35 FAILED (failures=1)
2022-09-29 15:52:35 [[  0 -52 -28   0   0]
2022-09-29 15:52:35  [ 63   0  89 -93   0]]
2022-09-29 15:52:35 [  0 -52 -28   0   0  63   0  89 -93   0]
2022-09-29 15:52:35 sp_x.grad.to_dense().numpy():  [[0 1 1 0 0]
2022-09-29 15:52:35  [1 0 1 1 0]]
2022-09-29 15:52:35 dense_x.grad.numpy():  [[1 1 1 1 1]
2022-09-29 15:52:35  [1 1 1 1 1]]
2022-09-29 15:52:35 mask:  [[0 1 1 1 0]
2022-09-29 15:52:35  [1 0 1 1 0]]
2022-09-29 15:52:35 0% tests passed, 1 tests failed out of 1
2022-09-29 15:52:35 Total Test time (real) =   3.90 sec
2022-09-29 15:52:35 The following tests FAILED:
2022-09-29 15:52:35 	1292 - test_sparse_reshape_op (Failed)
2022-09-29 15:52:35 Errors while running CTest
2022-09-29 15:52:35 ========================================
2022-09-29 15:52:35 Added UT should pass three additional executions

paddle-bot · 2022-10-01T08:47:53Z

很抱歉，经过我们的反复讨论，你的PR暂未达到合入标准，请阅读飞桨原生算子开发规范，你可以重新提交新的PR，我们先将此PR关闭，感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

paddle-bot bot added contributor External developers status: proposed labels Sep 20, 2022

OccupyMars2025 mentioned this pull request Sep 20, 2022

[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46242

Closed

luotao1 assigned luotao1, Ligoml and zhwesky2010 Sep 21, 2022

OccupyMars2025 mentioned this pull request Sep 22, 2022

FatalError: Erroneous arithmetic operation is detected by the operating system. #46404

Closed

zhwesky2010 reviewed Sep 22, 2022

View reviewed changes

OccupyMars2025 mentioned this pull request Sep 24, 2022

[OSError] paddle::platform::GpuMemcpySync #46469

Closed

Ligoml mentioned this pull request Sep 26, 2022

【PaddlePaddle Hackathon 第三期】任务总览 #43938

Closed

zhwesky2010 reviewed Sep 27, 2022

View reviewed changes

OccupyMars2025 mentioned this pull request Sep 29, 2022

[Bug ] the numpy() method of a sparse coo cuda tensor reports an error "Segmentation fault (core dumped)" #46642

Closed

OccupyMars2025 closed this Oct 1, 2022

OccupyMars2025 force-pushed the hackathon-3rd-task22-add-paddle.incubate.sparse.reshape-version002 branch from 4420f4f to ecae7b3 Compare October 1, 2022 08:47

paddle-bot bot added status: not progressed and removed status: proposed labels Oct 1, 2022

OccupyMars2025 mentioned this pull request Oct 1, 2022

[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46694

Merged

[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46333

[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46333

Conversation

OccupyMars2025 commented Sep 20, 2022 • edited Loading

PR types

PR changes

Describe

本PR的前身是 #46242 ，本PR是删除了与第22号任务无关的内容后的精简版

本PR是在 #45849 基础上修改得到，而且 sparse reshape 和 sparse transpose 有相似性

paddle-bot bot commented Sep 20, 2022

OccupyMars2025 commented Sep 21, 2022

"Floating-point exception"， so change the dtype in all test cases to int64 to check whether the computation logic is right

OccupyMars2025 commented Sep 21, 2022 • edited Loading

After changing the dtype in test cases to int64, the test cases still report "Floating-point exception", it is strange

Check if the computation of ReshapeCooGradKernel is right

Integer divide-by-zero

OccupyMars2025 commented Sep 22, 2022 • edited Loading

It seems that the example code in the docstring of python API will also be used in CI test

It seems that CI test will check your comment content

OccupyMars2025 commented Sep 22, 2022

OccupyMars2025 commented Sep 22, 2022 • edited Loading

Delete grad kernel test case to check whether the forward kernel is correct

OccupyMars2025 commented Sep 22, 2022 • edited Loading

zhwesky2010 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OccupyMars2025 commented Sep 22, 2022 • edited Loading

感谢解答，就是这个唯一的单侧都不能跑通，而且报错信息，我有点看不懂，尤其是下面这两个报错信息：

OccupyMars2025 commented Sep 22, 2022

我先按照你的意见修改完后，再跑一遍CI，看是否还是报这两个错误

OccupyMars2025 commented Sep 23, 2022 • edited Loading

根据飞桨专家的意见修改后，还是有一样的错误，如下：

在nvidia官网上搜索得到如下解释：

猜测可能的报错原因：forward coo cuda kernel 中 用int64_t * 指针给indices的 DenseTensor的元素赋值时，超出范围？？？

OccupyMars2025 commented Sep 23, 2022

似乎是 sp_out.to_dense().numpy() 导致了报错，而 sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape) 似乎能计算，测试一下

OccupyMars2025 commented Sep 23, 2022 • edited Loading

How to fix it ?

The Chinese comment may be the cause of the error. So I translate the Chinese comment into English

OccupyMars2025 commented Sep 23, 2022 • edited Loading

but in aistudio, dense tensor has no problems, so maybe the cause is that paddle.reshape and paddle.incubate.sparse.reshape are operating on the same paddle tensor.

How to fix it ? my solution is to use numpy to generate equal but different paddle tensor: paddle.to_tensor(np_x)

OccupyMars2025 commented Sep 23, 2022 • edited Loading

You need to add if paddle.is_compiled_with_cuda():

OccupyMars2025 commented Sep 24, 2022 • edited Loading

It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape) causes the error in which sp_x is a cuda sparse tensor, you should check your gpu forward coo kernel of sparse reshape

OccupyMars2025 commented Sep 24, 2022

The following picture shows that at least the forward coo kernel of sparse reshape on cpu can works in a right way.

OccupyMars2025 commented Sep 24, 2022 • edited Loading

this is the reason for dense_x.grad.numpy() * mask

OccupyMars2025 commented Sep 24, 2022 • edited Loading

There seems to be numerical unstability when doing backward computation on cpu. Run the test case multiple times. Then sometimes the two grad tensors have same values and sometimes the two grad tensors have different values.

OccupyMars2025 commented Sep 24, 2022

All test cases for cpu forward coo kernel are successful, so my forward computation logic is correct, but gpu forward coo kernel doesn't work.

OccupyMars2025 commented Sep 25, 2022 • edited Loading

飞桨高可复用算子库 PHI 设计文档 提到的需要注意的要点：

zhwesky2010 left a comment

Choose a reason for hiding this comment

OccupyMars2025 commented Sep 28, 2022

OccupyMars2025 commented Sep 29, 2022 • edited Loading

Maybe the error "illegal memory access" is caused by "not enough memory is allocated", so I change the configuration of the cuda kernel, change "0" to a large value "1024", but this is probably not the actual cause of the error.

OccupyMars2025 commented Sep 29, 2022 • edited Loading

[Bug from the PaddlePaddle framework] When I run the following code, I will get the following error message

I'm wrong!!!! The code sp_x.numpy() is illegal itself. I should use sp_x.to_dense().numpy()

OccupyMars2025 commented Sep 29, 2022 • edited Loading

paddle-bot bot commented Oct 1, 2022

OccupyMars2025 commented Sep 20, 2022 •

edited

Loading

OccupyMars2025 commented Sep 21, 2022 •

edited

Loading

OccupyMars2025 commented Sep 22, 2022 •

edited

Loading

OccupyMars2025 commented Sep 22, 2022 •

edited

Loading

OccupyMars2025 commented Sep 22, 2022 •

edited

Loading

OccupyMars2025 commented Sep 22, 2022 •

edited

Loading

OccupyMars2025 commented Sep 23, 2022 •

edited

Loading

猜测可能的报错原因：forward coo cuda kernel 中用int64_t * 指针给indices的 DenseTensor的元素赋值时，超出范围？？？

似乎是 `sp_out.to_dense().numpy()` 导致了报错，而 `sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)` 似乎能计算，测试一下

OccupyMars2025 commented Sep 23, 2022 •

edited

Loading

OccupyMars2025 commented Sep 23, 2022 •

edited

Loading

OccupyMars2025 commented Sep 23, 2022 •

edited

Loading

You need to add `if paddle.is_compiled_with_cuda():`

OccupyMars2025 commented Sep 24, 2022 •

edited

Loading

It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually `sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)` causes the error in which `sp_x` is a cuda sparse tensor, you should check your gpu forward coo kernel of sparse reshape

OccupyMars2025 commented Sep 24, 2022 •

edited

Loading

this is the reason for `dense_x.grad.numpy() * mask`

OccupyMars2025 commented Sep 24, 2022 •

edited

Loading

OccupyMars2025 commented Sep 25, 2022 •

edited

Loading

飞桨高可复用算子库 PHI 设计文档提到的需要注意的要点：

OccupyMars2025 commented Sep 29, 2022 •

edited

Loading

OccupyMars2025 commented Sep 29, 2022 •

edited

Loading

I'm wrong!!!! The code `sp_x.numpy()` is illegal itself. I should use `sp_x.to_dense().numpy()`

OccupyMars2025 commented Sep 29, 2022 •

edited

Loading