-
Notifications
You must be signed in to change notification settings - Fork 6.8k
optimization for dot(csr.T, dense) = rsp #8611
Conversation
src/operator/tensor/dot-inl.h
Outdated
@@ -573,6 +577,18 @@ inline void DotCsrDnsDnsImpl(const OpContext& ctx, | |||
}); | |||
} | |||
|
|||
|
|||
struct MarkCsrColKernel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse this kernel and give it a better name?
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/util/tensor_util-inl.h#L37
src/operator/tensor/dot-inl.h
Outdated
row_idx_out, prefix_sum, num_rows); | ||
|
||
num_threads = mxnet_op::get_num_threads<cpu>(ret->shape()[0]); | ||
dim_t seg_len = (ret->shape()[0] + num_threads - 1) / num_threads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace ret->shape()[0]
by nnr
instead?
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/indexing_op.h#L661
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no, this range should cover num_cols of csr, i.e. num_rows of output. So nnr is not helpful for this kernel.
Yes, the original dot(csr.T, dense) performance is not good. The same for scipy - it's not always faster than the dense dot even with very sparse data. |
@@ -375,28 +375,32 @@ struct DotCsrTransDnsRspByRowBlocks { | |||
* \brief |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should be updated
Add benchmark for Before,
After,
|
This reverts commit f79d22d.
* optimization for dot(csr.T, dense) = rsp * remove unneccessary headers * load balance * minor fix and update comments * resolve * trigger * trigger
hi, @eric-haibin-lin @ZiyueHuang , can the storage type of dot(csr.T, rsp) be rsp? |
It's now indeed rsp:
|
oh, i know, thanks. |
* optimization for dot(csr.T, dense) = rsp * remove unneccessary headers * load balance * minor fix and update comments * resolve * trigger * trigger
Description
Use prefix sum to compute
nnr
in order to allocate the row_sparse output.Currently
dot(csr.T, dense) = rsp
will allocate the dense output and then cast it to row_sparse, but not free the unused memory.I use
run_benchmark(context, lhs="csr", rhs="default", lhs_trans=True, ...)
inmxnet/benchmark/python/sparse/dot.py
. Please correct me if I'm wrong.But is
dot(csr.T, dense) = rsp
in master slow like this? Might due to others are using my machine at the same time?Performance of origin
dot(csr.T, dense) = rsp
,After this PR,
As a feature requested in #8168
cc @eric-haibin-lin
Checklist
Essentials
make lint
)Changes
Comments