Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46333

Conversation

OccupyMars2025
Copy link
Contributor

@OccupyMars2025 OccupyMars2025 commented Sep 20, 2022

PR types

New features

PR changes

OPs

Describe

[used AI Studio] add paddle.incubate.sparse.reshape

本PR的前身是 #46242 ,本PR是删除了与第22号任务无关的内容后的精简版

本PR是在 #45849 基础上修改得到,而且 sparse reshape 和 sparse transpose 有相似性

@paddle-bot
Copy link

paddle-bot bot commented Sep 20, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@OccupyMars2025
Copy link
Contributor Author

"Floating-point exception", so change the dtype in all test cases to int64 to check whether the computation logic is right

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 21, 2022

After changing the dtype in test cases to int64, the test cases still report "Floating-point exception", it is strange

image

Check if the computation of ReshapeCooGradKernel is right

image

Integer divide-by-zero

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 22, 2022

It seems that the example code in the docstring of python API will also be used in CI test

image

image

It seems that CI test will check your comment content

image

@OccupyMars2025
Copy link
Contributor Author

// /* Caution : 这是原来的计算逻辑,我认为是 错误的,
// 这里计算逻辑是: 原tensor的shape是 (10, 20, 30, 40, 50)
// 一个非零元素的索引为 (1, 2, 3, 4, 5)
// 进行transpose 后, tensor的shape 是 (30, 10, 50, 20, 40)
// 这里的计算逻辑就认为该非零元素的新索引就是 (3, 1, 5, 2, 4)
// 没错,这就是transpose的计算逻辑,transpose后元素在内存中的位置改变了
// 你更改的逻辑其实是 reshape的计算逻辑,reshape后所有元素在内存中的位置均不变

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 22, 2022

Delete grad kernel test case to check whether the forward kernel is correct

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 22, 2022

failed CI,@zhouwei25 之前的CI报错,我进行了一些修改,但是现在的CI报错,我有点看不懂了,能否请飞桨专家给一些方向性的意见。在python\paddle\fluid\tests\unittests\test_sparse_reshape_op.py 里面,我已经取消了backward检测,只保留了一个测试用例。

image

image

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前CI能跑通吗

args : (Tensor x, int64_t[] new_shape)
output : Tensor(out)
infer_meta :
func : sparse::ReshapeInferMeta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个可以复用Dense的ReshapeInferMeta,不用新增一个

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const std::vector<int64_t>& new_shape,
SparseCooTensor* out) {
/*
目前只能针对 sparse part dims 部分进行reshape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用英文注释吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -608,3 +608,26 @@ def expm1(x, name=None):
out = paddle.incubate.sparse.expm1(sparse_x)
"""
return _C_ops.sparse_expm1(x)

@dygraph_only
def reshape(x, new_shape, name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API参数名和dense一致,用shape,另外yaml、c++ kernel传递到底层都用这个

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// const std::vector<int>& perm,
SparseCooTensor* dx) {
EmptyLikeCooKernel<T, Context>(dev_ctx, x, dx);
std::vector<int64_t> x_shape(x.dims().size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接用phi::vectorize函数,将ddim转成vector吧,不用这么复杂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const int64_t* out_sparse_part_strides,
int64_t *out_indices_data) {

// for (std::size_t i = 0; i < n_dim; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (int i = 0; i < x.sparse_dim(); ++i) {
x_sparse_part_dims.push_back(x.dims()[i]);
}
for (int i = 0; i < out_dims.size() - x.dense_dim(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是new_shape的尾部去除x的dense_dim才是新的shape?如果仅针对sparse_dim部分reshape话,还是在API文档里说明,然后new_shape仅针对sparse_dim生效

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done。对,只针对sparse_dim生效,但new_shape 必须指定为完整的shape,比如 一个 sparse coo 格式的tensor, shape 为 (2, 3, 4, 5, 6), 其中 (2, 3, 4)是 sparse_dim,(5,6)是 dense_dim,我想把 sparse_dim 部分reshape 为 (3, 8), 则new_shape必须指定为 (3, 8, 5, 6) ,不能指定为 (3, 8)

class TestReshape(unittest.TestCase):
# x: sparse, out: sparse
def check_result(self, x_shape, new_shape, format):
with _test_eager_guard():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以删除了


def test_reshape_2d(self):
self.check_result([2, 5], [10,], 'coo')
# self.check_result([10, 5], [2, 25], 'csr')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里单测能跑通吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不能跑通

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 22, 2022

感谢解答,就是这个唯一的单侧都不能跑通,而且报错信息,我有点看不懂,尤其是下面这两个报错信息:

image

image

@OccupyMars2025
Copy link
Contributor Author

我先按照你的意见修改完后,再跑一遍CI,看是否还是报这两个错误

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 23, 2022

根据飞桨专家的意见修改后,还是有一样的错误,如下:

image

在nvidia官网上搜索得到如下解释:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html
image

猜测可能的报错原因:forward coo cuda kernel 中 用int64_t * 指针给indices的 DenseTensor的元素赋值时,超出范围???

@OccupyMars2025
Copy link
Contributor Author

似乎是 sp_out.to_dense().numpy() 导致了报错,而 sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape) 似乎能计算,测试一下

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 23, 2022

How to fix it ?

image

The Chinese comment may be the cause of the error. So I translate the Chinese comment into English

https://www.cnblogs.com/VVingerfly/p/13751289.html
image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 23, 2022

image

but in aistudio, dense tensor has no problems, so maybe the cause is that paddle.reshape and paddle.incubate.sparse.reshape are operating on the same paddle tensor.

image

How to fix it ? my solution is to use numpy to generate equal but different paddle tensor: paddle.to_tensor(np_x)

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 23, 2022

You need to add if paddle.is_compiled_with_cuda():

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 24, 2022

It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape) causes the error in which sp_x is a cuda sparse tensor, you should check your gpu forward coo kernel of sparse reshape

image

```python
        # dense_x = paddle.clone(origin_x.detach())
        mask = np.random.randint(0, 2, x_shape)
        np_x = np.random.randint(-100, 100, x_shape) * mask 
        
        
        ### cpu version
        dense_x = paddle.to_tensor(np_x, place=paddle.CPUPlace())
        dense_x.numpy()
        print(dense_x.numpy())
        dense_x.stop_gradient = False
        dense_x.numpy()
        # dense_out = paddle.transpose(dense_x, dims)
        dense_out = paddle.reshape(dense_x, new_shape)
        dense_out.numpy()
        print(dense_out.numpy())

        if format == "coo":
            # sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
            sp_x = paddle.to_tensor(np_x, place=paddle.CPUPlace()).to_sparse_coo(len(x_shape))
        else:
            # sp_x = origin_x.detach().to_sparse_csr()
            sp_x = paddle.to_tensor(np_x, place=paddle.CPUPlace()).to_sparse_csr()
        sp_x.stop_gradient = False
        # sp_out = paddle.incubate.sparse.transpose(sp_x, dims)
        sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)

        print(10*'=', "OccupyMars2025 the following is dense_out", 10*'=')
        print("dense_out.numpy():", dense_out.numpy())
        print("dense_out:", dense_out)
        print(10*'=', "OccupyMars2025 the following is sp_out", 10*'=')
        print("sp_out:", sp_out)
        print("sp_out.to_dense():", sp_out.to_dense())
        print("sp_out.to_dense().numpy():", sp_out.to_dense().numpy())
        print(10*'=', "OccupyMars2025 the end", 10*'=')
        
        np.testing.assert_allclose(sp_out.to_dense().numpy(),
                                dense_out.numpy(),
                                rtol=1e-05)

        if paddle.is_compiled_with_cuda():
            ## cuda version
            dense_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0))
            dense_x.numpy()
            print(dense_x.numpy())
            dense_x.stop_gradient = False
            dense_x.numpy()
            # dense_out = paddle.transpose(dense_x, dims)
            dense_out = paddle.reshape(dense_x, new_shape)
            dense_out.numpy()
            print(dense_out.numpy())

            if format == "coo":
                # sp_x = origin_x.detach().to_sparse_coo(len(x_shape))
                sp_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0)).to_sparse_coo(len(x_shape))
            else:
                # sp_x = origin_x.detach().to_sparse_csr()
                sp_x = paddle.to_tensor(np_x, place=paddle.CUDAPlace(0)).to_sparse_csr()
            sp_x.stop_gradient = False
            # sp_out = paddle.incubate.sparse.transpose(sp_x, dims)
            sp_out = paddle.incubate.sparse.reshape(sp_x, new_shape)

            print(10*'=', "OccupyMars2025 the following is dense_out", 10*'=')
            print("dense_out.numpy():", dense_out.numpy())      #report error at this line
 

@OccupyMars2025
Copy link
Contributor Author

The following picture shows that at least the forward coo kernel of sparse reshape on cpu can works in a right way.

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 24, 2022

this is the reason for dense_x.grad.numpy() * mask

image

            dense_out.backward()
            sp_out.backward()
            np.testing.assert_allclose(sp_x.grad.to_dense().numpy(),
                                       dense_x.grad.numpy() * mask,
                                   #    dense_x.grad.numpy(),
                                       rtol=1e-05)

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 24, 2022

There seems to be numerical unstability when doing backward computation on cpu. Run the test case multiple times. Then sometimes the two grad tensors have same values and sometimes the two grad tensors have different values.

image

image

@OccupyMars2025
Copy link
Contributor Author

All test cases for cpu forward coo kernel are successful, so my forward computation logic is correct, but gpu forward coo kernel doesn't work.

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 25, 2022

飞桨高可复用算子库 PHI 设计文档 提到的需要注意的要点:

判断是否要进行跨设备数据拷贝

按训练和推理场景拆分编译
例如:推理不编译反向相关 kernel,也不编译带有 Intermediate 输出的前向 kernel

长线上支持跨设备 kernel 的写法统一需求,并且直观易用,不引入不必要的模板参数

解释:算子库下层还有 Kernel Primitive API 模块,其长线愿景是每个运算,只用一个 kernel,就能够适应多种设备,真正区分设备的代码,仅在 Kernel Primitive API 实现中;未来复用 kernel 传入较复杂的模板参数时,需要限制参数尽可能地简洁

  • For Tensor, ALL represents an illegal Backend, but for Kernel, some
  • kernels may be device-independent by nature, such as reshape;
  • and some kernels are also device-independent when implemented based on
  • primitive API.

这里什么不使用原先 fluid 的 VarType?
理由 1:原先 fluid 的 DataType 和 VarType 是同级概念,设计是比较混乱的,例如 LoDTensor 和 FLOAT32 是同级概念,但这两者显然不是的,我们不希望继承原先有明显缺陷的设计
理由 2:和 fluid 解耦依赖,便于后续 PHI 可以独立编译

Scalar (标量) 用来统一表示具有不同基础数据类型(float, double, int, bool 等)的变量。(目前也支持表示元素数量为 1 的 Tensor 标量,但后续可能会放弃该功能的支持)

以ScaleKernel为例,其中的scale参数可以传入 int,float,double 等普通数据类型。如果不使用Scalar来表示的话,需要为每种数据类型单独创建一个函数接口,这样会大大增加开发 Kernel 的代码量,因此Scalar主要应用在具有不同数据类型的同一参数上,可以避免该场景下需要编写多个重载函数的问题。

当一个 Tensor 赋值给另一个 Tensor,或者 Tensor 作为函数返回值时,实际上只会拷贝指针,不会产生真实的数据拷贝

编译解耦:

这里带有的 autograd 信息,只是一个指针索引,默认为空
std::unique_ptr autograd_meta_ = nullptr;
而这里的 AbstractAutogradMeta 是一个抽象类接口,不会依赖 autograd 的任何模块,因此不会影响 PHI 的独立编译,同时又兼顾了动态图 Tensor 需要持有反向信息的需求。
这里的 AutogradMeta 仅在动态图场景中才会设置,不需要的场景,比如静态图内就仅仅是个空指针而已

DenseTensor 对应原 fluid 内的 LoDTensor 类,是 Tensor 的基础实现,DenseTensor 内的 DenseTensorMeta 包含描述 Tensor 信息的基础成员,DenseTensor 内的 Allocation 就是 fluid 原有的 Allocation

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你之前发的报错是两个问题:

  1. CUDA显存访问越界了,可能是分配的不够,这个要仔细检查下标
  2. 编译问题

@OccupyMars2025
Copy link
Contributor Author

  1. 编译问题,按照我的理解,我应该是解决了
  2. 我再研究一下CUDA kernel的问题

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 29, 2022

Maybe the error "illegal memory access" is caused by "not enough memory is allocated", so I change the configuration of the cuda kernel, change "0" to a large value "1024", but this is probably not the actual cause of the error.

image

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 29, 2022

[Bug from the PaddlePaddle framework] When I run the following code, I will get the following error message

import paddle
sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
sp_x.numpy()   # this line reports an error
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1664432194 (unix time) try "date -d @1664432194" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 11542 (TID 0x7f1f16333740) from PID 0 ***]

Segmentation fault (core dumped)
aistudio@jupyter-242560-4589773:~/work$ 

I install the PaddlePaddle wheel package which is built from the branch OccupyMars2025:hackathon-3rd-task22-add-paddle.incubate.sparse.reshape-version002, but because the reported error has nothing to do with my newly added code, so I think maybe there is some bug in the PaddlePaddle framework itself. I use the following environment to compile the source code. And I use this command to compile: cmake .. -DPY_VERSION=3.8 -DWITH_GPU=ON

image

I'm rebuilding from the source code on the branch develop to check if the bug is actually from the PaddlePaddle framework itself. (2022/9/29 15:50). At 2022/9/29 17:06, I finished building from the source code on the branch "develop", and the above code reports the same error , so I'm certain that there is a bug in the source code on the branch "develop" which had been the cause of my failed CI.

I'm wrong!!!! The code sp_x.numpy() is illegal itself. I should use sp_x.to_dense().numpy()

>>> import paddle
>>> sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
W0929 09:40:46.203522 31427 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0929 09:40:46.206414 31427 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
>>> sp_x.to_dense()
Tensor(shape=[1], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       [3])
>>> sp_x.to_dense().numpy()
array([3])
>>> 

@OccupyMars2025
Copy link
Contributor Author

OccupyMars2025 commented Sep 29, 2022

Now, I'm certain that cpu coo forward kernel is ok.
But there are two possible errors:

  1. cuda coo forward kernel
  2. cpu coo backward kernel
2022-09-29 15:52:35        [1, 0, 1, 1, 0]])
2022-09-29 15:52:35  y: array([[0, 1, 1, 1, 0],
2022-09-29 15:52:35        [1, 0, 1, 1, 0]])
2022-09-29 15:52:35 ----------------------------------------------------------------------
2022-09-29 15:52:35 Ran 1 test in 0.005s
2022-09-29 15:52:35 FAILED (failures=1)
2022-09-29 15:52:35 [[  0 -52 -28   0   0]
2022-09-29 15:52:35  [ 63   0  89 -93   0]]
2022-09-29 15:52:35 [  0 -52 -28   0   0  63   0  89 -93   0]
2022-09-29 15:52:35 sp_x.grad.to_dense().numpy():  [[0 1 1 0 0]
2022-09-29 15:52:35  [1 0 1 1 0]]
2022-09-29 15:52:35 dense_x.grad.numpy():  [[1 1 1 1 1]
2022-09-29 15:52:35  [1 1 1 1 1]]
2022-09-29 15:52:35 mask:  [[0 1 1 1 0]
2022-09-29 15:52:35  [1 0 1 1 0]]
2022-09-29 15:52:35 0% tests passed, 1 tests failed out of 1
2022-09-29 15:52:35 Total Test time (real) =   3.90 sec
2022-09-29 15:52:35 The following tests FAILED:
2022-09-29 15:52:35 	1292 - test_sparse_reshape_op (Failed)
2022-09-29 15:52:35 Errors while running CTest
2022-09-29 15:52:35 ========================================
2022-09-29 15:52:35 Added UT should pass three additional executions

@OccupyMars2025 OccupyMars2025 force-pushed the hackathon-3rd-task22-add-paddle.incubate.sparse.reshape-version002 branch from 4420f4f to ecae7b3 Compare October 1, 2022 08:47
@paddle-bot
Copy link

paddle-bot bot commented Oct 1, 2022

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants