Operators for sum(csr, axis=0) and sum(csr, axis=1) #8174

anirudh2290 · 2017-10-08T01:48:48Z

Adds operator sum(csr, axis=0) = dense and sum(csr, axis=1)
Tried 128*100M shape csr matrix and was able to perform sum along axis 0 and 1. Density is 0.1%
Allocation fails for 128*100M for dense NDarray, for 1M and 10M the speedup for sparse operator is 300X for 1M and 1200X for 10M for density of 0.1%(uniform distribution)
Completes one TODO in A Todo List for the Sparse Feature (CPU) #8168

@eric-haibin-lin @reminisce @cjolivier01 @piiswrong

…m_axis

eric-haibin-lin · 2017-10-09T17:31:08Z

src/operator/tensor/broadcast_reduce_op.h

+  bool dispatched = false;
+  const bool invalid_ctx = dev_mask != mshadow::cpu::kDevMask;
+  const auto dispatch_ex =
+      invalid_ctx ? DispatchMode::kFComputeFallback : DispatchMode::kFComputeEx;


Does this operator only work on cpu?

nvm let's focus on cpu for now :)

eric-haibin-lin · 2017-10-09T17:32:26Z

src/operator/tensor/broadcast_reduce_op_value.cc

@@ -45,12 +45,16 @@ Defined in )code";
 }

 MXNET_OPERATOR_REGISTER_REDUCE(sum)
+.add_alias("_sparse_sum")


Use MXNET_ADD_SPARSE_OP_ALIAS(sum)

Yes, MXNET_ADD_SPARSE_OP_ALIAS() is the greatest macro ever invented!

eric-haibin-lin · 2017-10-09T17:35:48Z

src/operator/tensor/broadcast_reduce_op_value.cc

 .add_alias("sum_axis")
 .describe(R"code(Computes the sum of array elements over given axes.

 .. Note::

  `sum` and `sum_axis` are equivalent.
+  For CSRNDArray summation along axis 0 and axis 1 is supported.
+  Setting keepdims or exclude to True with CSRNDArray will cause
+  fallback to dense operator.


Try to avoid python specific terms in operator documentation since it's shared by all language bindings. I suggest replacing CSRNDArray with "ndarray of csr storage type"

eric-haibin-lin · 2017-10-09T17:36:15Z

src/operator/tensor/cast_storage.cc

    rsp.indices = [0, 1]
    rsp.values = [[ 0.,  1.,  0.],
                  [ 2.,  0.,  3.]]

    # cast to csr storage type
-    csr = cast_storage(default, 'csr')
+    csr = cast_storage(dense, 'csr')


good catch! Thanks for fixing this

eric-haibin-lin · 2017-10-09T17:37:59Z

tests/python/unittest/test_sparse_operator.py

+def test_sparse_sum_axis():
+    def test_variations():
+        dim0 = 30
+        dim1 = 1000


Is it necessary to test with dim = 1000? We try to keep the unit test suite light-weight (except when an operator uses a different kernel optimized for a big shape like cast_storage).

Changed dim to 100

eric-haibin-lin · 2017-10-09T17:38:47Z

tests/python/unittest/test_sparse_operator.py

+        dim0 = 30
+        dim1 = 1000
+        axes = [0, 1]
+        densities = [0, 0.01, 0.1, 0.2, 0.5]


I think densities = [0, 0.5, 1] should cover all cases

eric-haibin-lin · 2017-10-09T17:45:34Z

src/operator/tensor/broadcast_reduce_op.h

+  // only dense output storage type is supported
+  CHECK_EQ(output->storage_type(), kDefaultStorage);
+
+  CHECK_NE(req, kWriteInplace);


Since the result is dense, I think kWriteInplace and kAddTo work fine for sum. Usually, if the output is sparse, then we only support kWriteTo and kNullOp. So this check is not necessary I think

eric-haibin-lin · 2017-10-09T18:15:47Z

src/operator/tensor/broadcast_reduce_op.h

+                                  const RType* in_indptr, const IType* in_idx,
+                                  const DType* in_data,
+                                  const int64_t num_rows) {
+    DType sum, residual;


I think each thread should handle multiple/a range of output elements instead of just one so that you perform fewer binary searches. We can use the temp resource to store temp sum result. We can use num_cpu_thread to invoke the kernel instead of size_of_output. Also if nnz per row is fewer than 16 then I guess linear search instead of binary search would be faster..

You can look at cast_storage gpu implementation to see how to request a temp resource

It would be more consistent to have num_rows as RType. Also will avoid signed/unsigned problems when RType varies in the future.

@eric-haibin-lin Great suggestion ! I have changed the logic to handle a range instead of individual element. @cjolivier01 Good catch! I am using RType and IType for num_rows now.

reminisce · 2017-10-10T15:33:28Z

src/operator/tensor/broadcast_reduce_op.h

+  CHECK_EQ(in_attrs->size(), 1);
+  CHECK_EQ(out_attrs->size(), 1);
+  const ReduceAxesParam& param = nnvm::get<ReduceAxesParam>(attrs.parsed);
+  const auto& in_stype = in_attrs->at(0);


For primitive types such as int, float, etc., no need to use const reference. In addition, auto is not good for code readability. Use specific types if the type name is short.

reminisce · 2017-10-10T15:35:33Z

src/operator/tensor/broadcast_reduce_op.h

+  const bool invalid_ctx = dev_mask != mshadow::cpu::kDevMask;
+  const auto dispatch_ex =
+      invalid_ctx ? DispatchMode::kFComputeFallback : DispatchMode::kFComputeEx;
+  if (!dispatched && in_stype == kDefaultStorage) {


Add some comments for each condition check for easy understanding of dispatching logic.

Added comments

reminisce · 2017-10-10T19:05:12Z

src/operator/tensor/broadcast_reduce_op.h

+template <int req>
+struct SumCsrKernel<req, 0> {
+  template <typename RType, typename IType, typename DType>
+  MSHADOW_XINLINE static void Map(int j, DType* out_data,


Add a comment here explaining the meaning of j so that it's easy to understand how this is parallelized.

Added comments

reminisce · 2017-10-10T19:08:18Z

src/operator/tensor/broadcast_reduce_op.h

+    })
+  }
+
+  if (!input.storage_initialized()) {


Need to fill zeros for the output here.

Good catch! I have fixed it by adding filling zeroes for kWriteTo and kInPlace together.

reminisce · 2017-10-10T19:08:54Z

src/operator/tensor/broadcast_reduce_op.h

+      MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIdx), IType, {
+        MSHADOW_TYPE_SWITCH(input.dtype(), DType, {
+          MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
+            auto in_indptr = input.aux_data(kIndPtr).dptr<RType>();


Add const if possible.

Please reduce the use of ‘auto’.

reminisce · 2017-10-10T19:11:09Z

src/operator/tensor/broadcast_reduce_op.h

+    mshadow::red::sum::SetInitValue(sum, residual);
+    const IType jval = static_cast<IType>(j);
+    for (RType i = 0; i < num_rows; ++i) {
+      if (in_indptr[i] >= in_indptr[i + 1]) continue;


Is it possible that in_indptr[i] > in_indptr[i+1]?

reminisce · 2017-10-10T19:12:49Z

src/operator/tensor/broadcast_reduce_op.h

+        IType end = in_indptr[i + 1] - 1;
+        IType mid;
+        while (start <= end) {
+          mid = (start + end) / 2;


Use mid = start + (end - start) / 2.

cjolivier01 · 2017-10-11T02:56:04Z

src/operator/tensor/broadcast_reduce_op.h

+                                         DispatchMode* dispatch_mode,
+                                         std::vector<int>* in_attrs,
+                                         std::vector<int>* out_attrs) {
+  CHECK_EQ(in_attrs->size(), 1);


Please use 1U since size() returns unsigned (size_t)

cjolivier01 · 2017-10-11T03:00:20Z

src/operator/tensor/broadcast_reduce_op.h

+                                  const RType* in_indptr, const IType* in_idx,
+                                  const DType* in_data,
+                                  const int64_t num_rows) {
+    DType sum, residual;


It would be more consistent to have num_rows as RType. Also will avoid signed/unsigned problems when RType varies in the future.

cjolivier01 · 2017-10-11T03:03:47Z

src/operator/tensor/broadcast_reduce_op.h

+        // in_idx[in_indptr[i+1]]
+        // The assumption here is in_idx for each row is sorted
+        IType start = in_indptr[i];
+        IType end = in_indptr[i + 1] - 1;


Can you please cache the in_indptr[i + 1] value? It is used a lot an can introduct a lot of superfluous instructions.

cjolivier01 · 2017-10-11T03:06:43Z

src/operator/tensor/broadcast_reduce_op.h

+      MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIdx), IType, {
+        MSHADOW_TYPE_SWITCH(input.dtype(), DType, {
+          MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
+            auto in_indptr = input.aux_data(kIndPtr).dptr<RType>();


Please reduce the use of ‘auto’.

cjolivier01 · 2017-10-11T03:07:20Z

src/operator/tensor/broadcast_reduce_op.h

+    MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIndPtr), RType, {
+      MSHADOW_TYPE_SWITCH(input.dtype(), DType, {
+        MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
+          auto in_indptr = input.aux_data(kIndPtr).dptr<RType>();


Make non-output pointers const please

cjolivier01 · 2017-10-11T03:09:01Z

src/operator/tensor/broadcast_reduce_op.h

+  mshadow::Stream<xpu>* s = ctx.get_stream<xpu>();
+  const NDArrayStorageType istype = inputs[0].storage_type();
+  if (istype == kCSRStorage) {
+    CHECK_EQ(inputs[0].shape().ndim(), 2U) << "sum(csr) op only supports"


You don’t need the other <<. In fact, this causes an extra function call every time this function is executed regardless of whether the check succeeds or fails, so try to keep number of << calls low.

cjolivier01 · 2017-10-11T03:10:03Z

src/operator/tensor/broadcast_reduce_op_value.cc

@@ -45,12 +45,16 @@ Defined in )code";
 }

 MXNET_OPERATOR_REGISTER_REDUCE(sum)
+.add_alias("_sparse_sum")


Yes, MXNET_ADD_SPARSE_OP_ALIAS() is the greatest macro ever invented!

:

…m_axis

reminisce · 2017-10-13T04:23:41Z

src/operator/tensor/broadcast_reduce_op.h

+  CHECK_EQ(in_attrs->size(), 1U);
+  CHECK_EQ(out_attrs->size(), 1U);
+  const ReduceAxesParam& param = nnvm::get<ReduceAxesParam>(attrs.parsed);
+  int& in_stype = in_attrs->at(0);


Use int or const int. No need to use reference for primitive types in C++.

I can use the int for in_stype. I cannot use it for out_stype though because I need a reference to modify out_attrs. This breaks otherwise.

* Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start

#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (#8199) * Temporarily disable some unit tests to fix the build (#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once #8230 is fixed. test_autograd_save_memory: This can be re-enabled once #8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (#8247) * Updating license Headers * License changes * Sequential aug (#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

apache#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (apache#8199) * Temporarily disable some unit tests to fix the build (apache#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once apache#8230 is fixed. test_autograd_save_memory: This can be re-enabled once apache#8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (apache#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (apache#8247) * Updating license Headers * License changes * Sequential aug (apache#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (apache#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (#8199) * Temporarily disable some unit tests to fix the build (#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once #8230 is fixed. test_autograd_save_memory: This can be re-enabled once #8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (#8247) * Updating license Headers * License changes * Sequential aug (#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

* Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start

apache#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (apache#8199) * Temporarily disable some unit tests to fix the build (apache#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once apache#8230 is fixed. test_autograd_save_memory: This can be re-enabled once apache#8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (apache#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (apache#8247) * Updating license Headers * License changes * Sequential aug (apache#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (apache#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

Anirudh Subramanian added 9 commits October 5, 2017 06:45

Add Infer storage for sparse slice operator

084eb29

Remove unused files

a9ca0ca

Indentation fix and add gpu test for fallback

35cda13

Merge branch 'master' of https://github.com/dmlc/mxnet into slice_fix2

e558b3e

Change sum builtin to py_sum

0fe4a15

Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

d69a694

Merge branch 'master' of https://github.com/dmlc/mxnet into sparse_su…

8b2ccde

…m_axis

Documentation changes for sparse

f8bc94b

Add fallback unittest for keepdims and exclude

a50b1e6

eric-haibin-lin reviewed Oct 9, 2017

View reviewed changes

reminisce reviewed Oct 10, 2017

View reviewed changes

cjolivier01 reviewed Oct 11, 2017

View reviewed changes

anirudh2290 added 2 commits October 13, 2017 04:14

PR review based changes

1526f99

:

Merge branch 'master' of https://github.com/dmlc/mxnet into sparse_su…

7bb7d3d

…m_axis

reminisce reviewed Oct 13, 2017

View reviewed changes

anirudh2290 added 4 commits October 13, 2017 04:25

Fix CHECK_NE

04d34f3

Change in_stype to int

4733c52

Using const int instead of int

899026d

Initialize mid with the start

2433ba2

anirudh2290 mentioned this pull request Oct 13, 2017

Operators for mean(csr, axis=0) and mean(csr, axis=1) #8264

Merged

7 tasks

piiswrong merged commit 46ec178 into apache:master Oct 13, 2017

Operators for sum(csr, axis=0) and sum(csr, axis=1) #8174

Operators for sum(csr, axis=0) and sum(csr, axis=1) #8174

Conversation

anirudh2290 commented Oct 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 commented Oct 8, 2017 •

edited

Loading