-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Operators for sum(csr, axis=0) and sum(csr, axis=1) #8174
Conversation
bool dispatched = false; | ||
const bool invalid_ctx = dev_mask != mshadow::cpu::kDevMask; | ||
const auto dispatch_ex = | ||
invalid_ctx ? DispatchMode::kFComputeFallback : DispatchMode::kFComputeEx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this operator only work on cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm let's focus on cpu for now :)
@@ -45,12 +45,16 @@ Defined in )code"; | |||
} | |||
|
|||
MXNET_OPERATOR_REGISTER_REDUCE(sum) | |||
.add_alias("_sparse_sum") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use MXNET_ADD_SPARSE_OP_ALIAS(sum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, MXNET_ADD_SPARSE_OP_ALIAS() is the greatest macro ever invented!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
.add_alias("sum_axis") | ||
.describe(R"code(Computes the sum of array elements over given axes. | ||
|
||
.. Note:: | ||
|
||
`sum` and `sum_axis` are equivalent. | ||
For CSRNDArray summation along axis 0 and axis 1 is supported. | ||
Setting keepdims or exclude to True with CSRNDArray will cause | ||
fallback to dense operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid python specific terms in operator documentation since it's shared by all language bindings. I suggest replacing CSRNDArray
with "ndarray of csr
storage type"
rsp.indices = [0, 1] | ||
rsp.values = [[ 0., 1., 0.], | ||
[ 2., 0., 3.]] | ||
|
||
# cast to csr storage type | ||
csr = cast_storage(default, 'csr') | ||
csr = cast_storage(dense, 'csr') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! Thanks for fixing this
def test_sparse_sum_axis(): | ||
def test_variations(): | ||
dim0 = 30 | ||
dim1 = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to test with dim = 1000? We try to keep the unit test suite light-weight (except when an operator uses a different kernel optimized for a big shape like cast_storage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed dim to 100
dim0 = 30 | ||
dim1 = 1000 | ||
axes = [0, 1] | ||
densities = [0, 0.01, 0.1, 0.2, 0.5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think densities = [0, 0.5, 1]
should cover all cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
// only dense output storage type is supported | ||
CHECK_EQ(output->storage_type(), kDefaultStorage); | ||
|
||
CHECK_NE(req, kWriteInplace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the result is dense, I think kWriteInplace
and kAddTo
work fine for sum
. Usually, if the output is sparse, then we only support kWriteTo
and kNullOp
. So this check is not necessary I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
const RType* in_indptr, const IType* in_idx, | ||
const DType* in_data, | ||
const int64_t num_rows) { | ||
DType sum, residual; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think each thread should handle multiple/a range of output elements instead of just one so that you perform fewer binary searches. We can use the temp resource to store temp sum result. We can use num_cpu_thread to invoke the kernel instead of size_of_output. Also if nnz per row is fewer than 16 then I guess linear search instead of binary search would be faster..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look at cast_storage gpu implementation to see how to request a temp resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more consistent to have num_rows as RType. Also will avoid signed/unsigned problems when RType varies in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-haibin-lin Great suggestion ! I have changed the logic to handle a range instead of individual element. @cjolivier01 Good catch! I am using RType and IType for num_rows now.
CHECK_EQ(in_attrs->size(), 1); | ||
CHECK_EQ(out_attrs->size(), 1); | ||
const ReduceAxesParam& param = nnvm::get<ReduceAxesParam>(attrs.parsed); | ||
const auto& in_stype = in_attrs->at(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For primitive types such as int, float, etc., no need to use const reference. In addition, auto
is not good for code readability. Use specific types if the type name is short.
const bool invalid_ctx = dev_mask != mshadow::cpu::kDevMask; | ||
const auto dispatch_ex = | ||
invalid_ctx ? DispatchMode::kFComputeFallback : DispatchMode::kFComputeEx; | ||
if (!dispatched && in_stype == kDefaultStorage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments for each condition check for easy understanding of dispatching logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments
template <int req> | ||
struct SumCsrKernel<req, 0> { | ||
template <typename RType, typename IType, typename DType> | ||
MSHADOW_XINLINE static void Map(int j, DType* out_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here explaining the meaning of j
so that it's easy to understand how this is parallelized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments
}) | ||
} | ||
|
||
if (!input.storage_initialized()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fill zeros for the output here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I have fixed it by adding filling zeroes for kWriteTo and kInPlace together.
MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIdx), IType, { | ||
MSHADOW_TYPE_SWITCH(input.dtype(), DType, { | ||
MXNET_ASSIGN_REQ_SWITCH(req, req_type, { | ||
auto in_indptr = input.aux_data(kIndPtr).dptr<RType>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add const
if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reduce the use of ‘auto’.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
mshadow::red::sum::SetInitValue(sum, residual); | ||
const IType jval = static_cast<IType>(j); | ||
for (RType i = 0; i < num_rows; ++i) { | ||
if (in_indptr[i] >= in_indptr[i + 1]) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that in_indptr[i] > in_indptr[i+1]
?
IType end = in_indptr[i + 1] - 1; | ||
IType mid; | ||
while (start <= end) { | ||
mid = (start + end) / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mid = start + (end - start) / 2
.
DispatchMode* dispatch_mode, | ||
std::vector<int>* in_attrs, | ||
std::vector<int>* out_attrs) { | ||
CHECK_EQ(in_attrs->size(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use 1U since size() returns unsigned (size_t)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
const RType* in_indptr, const IType* in_idx, | ||
const DType* in_data, | ||
const int64_t num_rows) { | ||
DType sum, residual; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more consistent to have num_rows as RType. Also will avoid signed/unsigned problems when RType varies in the future.
// in_idx[in_indptr[i+1]] | ||
// The assumption here is in_idx for each row is sorted | ||
IType start = in_indptr[i]; | ||
IType end = in_indptr[i + 1] - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please cache the in_indptr[i + 1] value? It is used a lot an can introduct a lot of superfluous instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIdx), IType, { | ||
MSHADOW_TYPE_SWITCH(input.dtype(), DType, { | ||
MXNET_ASSIGN_REQ_SWITCH(req, req_type, { | ||
auto in_indptr = input.aux_data(kIndPtr).dptr<RType>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reduce the use of ‘auto’.
MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIndPtr), RType, { | ||
MSHADOW_TYPE_SWITCH(input.dtype(), DType, { | ||
MXNET_ASSIGN_REQ_SWITCH(req, req_type, { | ||
auto in_indptr = input.aux_data(kIndPtr).dptr<RType>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make non-output pointers const please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
mshadow::Stream<xpu>* s = ctx.get_stream<xpu>(); | ||
const NDArrayStorageType istype = inputs[0].storage_type(); | ||
if (istype == kCSRStorage) { | ||
CHECK_EQ(inputs[0].shape().ndim(), 2U) << "sum(csr) op only supports" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don’t need the other <<. In fact, this causes an extra function call every time this function is executed regardless of whether the check succeeds or fails, so try to keep number of << calls low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -45,12 +45,16 @@ Defined in )code"; | |||
} | |||
|
|||
MXNET_OPERATOR_REGISTER_REDUCE(sum) | |||
.add_alias("_sparse_sum") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, MXNET_ADD_SPARSE_OP_ALIAS() is the greatest macro ever invented!
CHECK_EQ(in_attrs->size(), 1U); | ||
CHECK_EQ(out_attrs->size(), 1U); | ||
const ReduceAxesParam& param = nnvm::get<ReduceAxesParam>(attrs.parsed); | ||
int& in_stype = in_attrs->at(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use int
or const int
. No need to use reference for primitive types in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use the int for in_stype. I cannot use it for out_stype though because I need a reference to modify out_attrs. This breaks otherwise.
* Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start
#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (#8199) * Temporarily disable some unit tests to fix the build (#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once #8230 is fixed. test_autograd_save_memory: This can be re-enabled once #8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (#8247) * Updating license Headers * License changes * Sequential aug (#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check
apache#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (apache#8199) * Temporarily disable some unit tests to fix the build (apache#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once apache#8230 is fixed. test_autograd_save_memory: This can be re-enabled once apache#8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (apache#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (apache#8247) * Updating license Headers * License changes * Sequential aug (apache#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (apache#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check
#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (#8199) * Temporarily disable some unit tests to fix the build (#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once #8230 is fixed. test_autograd_save_memory: This can be re-enabled once #8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (#8247) * Updating license Headers * License changes * Sequential aug (#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check
* Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start
apache#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (apache#8199) * Temporarily disable some unit tests to fix the build (apache#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once apache#8230 is fixed. test_autograd_save_memory: This can be re-enabled once apache#8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (apache#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (apache#8247) * Updating license Headers * License changes * Sequential aug (apache#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <[email protected]> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <[email protected]> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <[email protected]> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (apache#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check
@eric-haibin-lin @reminisce @cjolivier01 @piiswrong