-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46333
[Hackathon 3rd No.22 ] add paddle.incubate.sparse.reshape #46333
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
// /* Caution : 这是原来的计算逻辑,我认为是 错误的, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前CI能跑通吗
paddle/phi/api/yaml/sparse_ops.yaml
Outdated
args : (Tensor x, int64_t[] new_shape) | ||
output : Tensor(out) | ||
infer_meta : | ||
func : sparse::ReshapeInferMeta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个可以复用Dense的ReshapeInferMeta,不用新增一个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const std::vector<int64_t>& new_shape, | ||
SparseCooTensor* out) { | ||
/* | ||
目前只能针对 sparse part dims 部分进行reshape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用英文注释吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -608,3 +608,26 @@ def expm1(x, name=None): | |||
out = paddle.incubate.sparse.expm1(sparse_x) | |||
""" | |||
return _C_ops.sparse_expm1(x) | |||
|
|||
@dygraph_only | |||
def reshape(x, new_shape, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API参数名和dense一致,用shape
,另外yaml、c++ kernel传递到底层都用这个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// const std::vector<int>& perm, | ||
SparseCooTensor* dx) { | ||
EmptyLikeCooKernel<T, Context>(dev_ctx, x, dx); | ||
std::vector<int64_t> x_shape(x.dims().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接用phi::vectorize函数,将ddim转成vector吧,不用这么复杂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const int64_t* out_sparse_part_strides, | ||
int64_t *out_indices_data) { | ||
|
||
// for (std::size_t i = 0; i < n_dim; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for (int i = 0; i < x.sparse_dim(); ++i) { | ||
x_sparse_part_dims.push_back(x.dims()[i]); | ||
} | ||
for (int i = 0; i < out_dims.size() - x.dense_dim(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是new_shape的尾部去除x的dense_dim才是新的shape?如果仅针对sparse_dim部分reshape话,还是在API文档里说明,然后new_shape仅针对sparse_dim生效
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done。对,只针对sparse_dim生效,但new_shape 必须指定为完整的shape,比如 一个 sparse coo 格式的tensor, shape 为 (2, 3, 4, 5, 6), 其中 (2, 3, 4)是 sparse_dim,(5,6)是 dense_dim,我想把 sparse_dim 部分reshape 为 (3, 8), 则new_shape必须指定为 (3, 8, 5, 6) ,不能指定为 (3, 8)
class TestReshape(unittest.TestCase): | ||
# x: sparse, out: sparse | ||
def check_result(self, x_shape, new_shape, format): | ||
with _test_eager_guard(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以删除了
|
||
def test_reshape_2d(self): | ||
self.check_result([2, 5], [10,], 'coo') | ||
# self.check_result([10, 5], [2, 25], 'csr') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里单测能跑通吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不能跑通
我先按照你的意见修改完后,再跑一遍CI,看是否还是报这两个错误 |
根据飞桨专家的意见修改后,还是有一样的错误,如下:在nvidia官网上搜索得到如下解释:https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html 猜测可能的报错原因:forward coo cuda kernel 中 用int64_t * 指针给indices的 DenseTensor的元素赋值时,超出范围??? |
How to fix it ?The Chinese comment may be the cause of the error. So I translate the Chinese comment into English |
It may seem that numpy() method of a cuda dense tensor reports the error, but I found that actually
```python
|
All test cases for cpu forward coo kernel are successful, so my forward computation logic is correct, but gpu forward coo kernel doesn't work. |
飞桨高可复用算子库 PHI 设计文档 提到的需要注意的要点:判断是否要进行跨设备数据拷贝 按训练和推理场景拆分编译 长线上支持跨设备 kernel 的写法统一需求,并且直观易用,不引入不必要的模板参数 解释:算子库下层还有 Kernel Primitive API 模块,其长线愿景是每个运算,只用一个 kernel,就能够适应多种设备,真正区分设备的代码,仅在 Kernel Primitive API 实现中;未来复用 kernel 传入较复杂的模板参数时,需要限制参数尽可能地简洁
这里什么不使用原先 fluid 的 VarType? Scalar (标量) 用来统一表示具有不同基础数据类型(float, double, int, bool 等)的变量。(目前也支持表示元素数量为 1 的 Tensor 标量,但后续可能会放弃该功能的支持) 以ScaleKernel为例,其中的scale参数可以传入 int,float,double 等普通数据类型。如果不使用Scalar来表示的话,需要为每种数据类型单独创建一个函数接口,这样会大大增加开发 Kernel 的代码量,因此Scalar主要应用在具有不同数据类型的同一参数上,可以避免该场景下需要编写多个重载函数的问题。 当一个 Tensor 赋值给另一个 Tensor,或者 Tensor 作为函数返回值时,实际上只会拷贝指针,不会产生真实的数据拷贝 编译解耦: 这里带有的 autograd 信息,只是一个指针索引,默认为空 DenseTensor 对应原 fluid 内的 LoDTensor 类,是 Tensor 的基础实现,DenseTensor 内的 DenseTensorMeta 包含描述 Tensor 信息的基础成员,DenseTensor 内的 Allocation 就是 fluid 原有的 Allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你之前发的报错是两个问题:
- CUDA显存访问越界了,可能是分配的不够,这个要仔细检查下标
- 编译问题
|
[Bug from the PaddlePaddle framework] When I run the following code, I will get the following error messageimport paddle
sp_x = paddle.to_tensor([3], place=paddle.CUDAPlace(0)).to_sparse_coo(1)
sp_x.numpy() # this line reports an error --------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1664432194 (unix time) try "date -d @1664432194" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 11542 (TID 0x7f1f16333740) from PID 0 ***]
Segmentation fault (core dumped)
aistudio@jupyter-242560-4589773:~/work$
I install the PaddlePaddle wheel package which is built from the branch OccupyMars2025:hackathon-3rd-task22-add-paddle.incubate.sparse.reshape-version002, but because the reported error has nothing to do with my newly added code, so I think maybe there is some bug in the PaddlePaddle framework itself. I use the following environment to compile the source code. And I use this command to compile:
|
Now, I'm certain that cpu coo forward kernel is ok.
2022-09-29 15:52:35 [1, 0, 1, 1, 0]])
2022-09-29 15:52:35 y: array([[0, 1, 1, 1, 0],
2022-09-29 15:52:35 [1, 0, 1, 1, 0]])
2022-09-29 15:52:35 ----------------------------------------------------------------------
2022-09-29 15:52:35 Ran 1 test in 0.005s
2022-09-29 15:52:35 FAILED (failures=1)
2022-09-29 15:52:35 [[ 0 -52 -28 0 0]
2022-09-29 15:52:35 [ 63 0 89 -93 0]]
2022-09-29 15:52:35 [ 0 -52 -28 0 0 63 0 89 -93 0]
2022-09-29 15:52:35 sp_x.grad.to_dense().numpy(): [[0 1 1 0 0]
2022-09-29 15:52:35 [1 0 1 1 0]]
2022-09-29 15:52:35 dense_x.grad.numpy(): [[1 1 1 1 1]
2022-09-29 15:52:35 [1 1 1 1 1]]
2022-09-29 15:52:35 mask: [[0 1 1 1 0]
2022-09-29 15:52:35 [1 0 1 1 0]]
2022-09-29 15:52:35 0% tests passed, 1 tests failed out of 1
2022-09-29 15:52:35 Total Test time (real) = 3.90 sec
2022-09-29 15:52:35 The following tests FAILED:
2022-09-29 15:52:35 1292 - test_sparse_reshape_op (Failed)
2022-09-29 15:52:35 Errors while running CTest
2022-09-29 15:52:35 ========================================
2022-09-29 15:52:35 Added UT should pass three additional executions
|
4420f4f
to
ecae7b3
Compare
很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。 |
PR types
New features
PR changes
OPs
Describe
[used AI Studio] add paddle.incubate.sparse.reshape
本PR的前身是 #46242 ,本PR是删除了与第22号任务无关的内容后的精简版
本PR是在 #45849 基础上修改得到,而且 sparse reshape 和 sparse transpose 有相似性