Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Operators for sum(csr, axis=0) and sum(csr, axis=1) #8174

Merged
merged 15 commits into from
Oct 13, 2017

Conversation

anirudh2290
Copy link
Member

@anirudh2290 anirudh2290 commented Oct 8, 2017

  • Adds operator sum(csr, axis=0) = dense and sum(csr, axis=1)
  • Tried 128*100M shape csr matrix and was able to perform sum along axis 0 and 1. Density is 0.1%
  • Allocation fails for 128*100M for dense NDarray, for 1M and 10M the speedup for sparse operator is 300X for 1M and 1200X for 10M for density of 0.1%(uniform distribution)
  • Completes one TODO in A Todo List for the Sparse Feature (CPU) #8168

@eric-haibin-lin @reminisce @cjolivier01 @piiswrong

bool dispatched = false;
const bool invalid_ctx = dev_mask != mshadow::cpu::kDevMask;
const auto dispatch_ex =
invalid_ctx ? DispatchMode::kFComputeFallback : DispatchMode::kFComputeEx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this operator only work on cpu?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm let's focus on cpu for now :)

@@ -45,12 +45,16 @@ Defined in )code";
}

MXNET_OPERATOR_REGISTER_REDUCE(sum)
.add_alias("_sparse_sum")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MXNET_ADD_SPARSE_OP_ALIAS(sum)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MXNET_ADD_SPARSE_OP_ALIAS() is the greatest macro ever invented!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

.add_alias("sum_axis")
.describe(R"code(Computes the sum of array elements over given axes.

.. Note::

`sum` and `sum_axis` are equivalent.
For CSRNDArray summation along axis 0 and axis 1 is supported.
Setting keepdims or exclude to True with CSRNDArray will cause
fallback to dense operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid python specific terms in operator documentation since it's shared by all language bindings. I suggest replacing CSRNDArray with "ndarray of csr storage type"

rsp.indices = [0, 1]
rsp.values = [[ 0., 1., 0.],
[ 2., 0., 3.]]

# cast to csr storage type
csr = cast_storage(default, 'csr')
csr = cast_storage(dense, 'csr')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Thanks for fixing this

def test_sparse_sum_axis():
def test_variations():
dim0 = 30
dim1 = 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to test with dim = 1000? We try to keep the unit test suite light-weight (except when an operator uses a different kernel optimized for a big shape like cast_storage).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed dim to 100

dim0 = 30
dim1 = 1000
axes = [0, 1]
densities = [0, 0.01, 0.1, 0.2, 0.5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think densities = [0, 0.5, 1] should cover all cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

// only dense output storage type is supported
CHECK_EQ(output->storage_type(), kDefaultStorage);

CHECK_NE(req, kWriteInplace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the result is dense, I think kWriteInplace and kAddTo work fine for sum. Usually, if the output is sparse, then we only support kWriteTo and kNullOp. So this check is not necessary I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

const RType* in_indptr, const IType* in_idx,
const DType* in_data,
const int64_t num_rows) {
DType sum, residual;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think each thread should handle multiple/a range of output elements instead of just one so that you perform fewer binary searches. We can use the temp resource to store temp sum result. We can use num_cpu_thread to invoke the kernel instead of size_of_output. Also if nnz per row is fewer than 16 then I guess linear search instead of binary search would be faster..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at cast_storage gpu implementation to see how to request a temp resource

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more consistent to have num_rows as RType. Also will avoid signed/unsigned problems when RType varies in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin Great suggestion ! I have changed the logic to handle a range instead of individual element. @cjolivier01 Good catch! I am using RType and IType for num_rows now.

CHECK_EQ(in_attrs->size(), 1);
CHECK_EQ(out_attrs->size(), 1);
const ReduceAxesParam& param = nnvm::get<ReduceAxesParam>(attrs.parsed);
const auto& in_stype = in_attrs->at(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For primitive types such as int, float, etc., no need to use const reference. In addition, auto is not good for code readability. Use specific types if the type name is short.

const bool invalid_ctx = dev_mask != mshadow::cpu::kDevMask;
const auto dispatch_ex =
invalid_ctx ? DispatchMode::kFComputeFallback : DispatchMode::kFComputeEx;
if (!dispatched && in_stype == kDefaultStorage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments for each condition check for easy understanding of dispatching logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

template <int req>
struct SumCsrKernel<req, 0> {
template <typename RType, typename IType, typename DType>
MSHADOW_XINLINE static void Map(int j, DType* out_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here explaining the meaning of j so that it's easy to understand how this is parallelized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

})
}

if (!input.storage_initialized()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fill zeros for the output here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I have fixed it by adding filling zeroes for kWriteTo and kInPlace together.

MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIdx), IType, {
MSHADOW_TYPE_SWITCH(input.dtype(), DType, {
MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
auto in_indptr = input.aux_data(kIndPtr).dptr<RType>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add const if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reduce the use of ‘auto’.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

mshadow::red::sum::SetInitValue(sum, residual);
const IType jval = static_cast<IType>(j);
for (RType i = 0; i < num_rows; ++i) {
if (in_indptr[i] >= in_indptr[i + 1]) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that in_indptr[i] > in_indptr[i+1]?

IType end = in_indptr[i + 1] - 1;
IType mid;
while (start <= end) {
mid = (start + end) / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use mid = start + (end - start) / 2.

DispatchMode* dispatch_mode,
std::vector<int>* in_attrs,
std::vector<int>* out_attrs) {
CHECK_EQ(in_attrs->size(), 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 1U since size() returns unsigned (size_t)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

const RType* in_indptr, const IType* in_idx,
const DType* in_data,
const int64_t num_rows) {
DType sum, residual;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more consistent to have num_rows as RType. Also will avoid signed/unsigned problems when RType varies in the future.

// in_idx[in_indptr[i+1]]
// The assumption here is in_idx for each row is sorted
IType start = in_indptr[i];
IType end = in_indptr[i + 1] - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please cache the in_indptr[i + 1] value? It is used a lot an can introduct a lot of superfluous instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIdx), IType, {
MSHADOW_TYPE_SWITCH(input.dtype(), DType, {
MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
auto in_indptr = input.aux_data(kIndPtr).dptr<RType>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reduce the use of ‘auto’.

MSHADOW_IDX_TYPE_SWITCH(input.aux_type(kIndPtr), RType, {
MSHADOW_TYPE_SWITCH(input.dtype(), DType, {
MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
auto in_indptr = input.aux_data(kIndPtr).dptr<RType>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make non-output pointers const please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

mshadow::Stream<xpu>* s = ctx.get_stream<xpu>();
const NDArrayStorageType istype = inputs[0].storage_type();
if (istype == kCSRStorage) {
CHECK_EQ(inputs[0].shape().ndim(), 2U) << "sum(csr) op only supports"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don’t need the other <<. In fact, this causes an extra function call every time this function is executed regardless of whether the check succeeds or fails, so try to keep number of << calls low.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -45,12 +45,16 @@ Defined in )code";
}

MXNET_OPERATOR_REGISTER_REDUCE(sum)
.add_alias("_sparse_sum")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MXNET_ADD_SPARSE_OP_ALIAS() is the greatest macro ever invented!

CHECK_EQ(in_attrs->size(), 1U);
CHECK_EQ(out_attrs->size(), 1U);
const ReduceAxesParam& param = nnvm::get<ReduceAxesParam>(attrs.parsed);
int& in_stype = in_attrs->at(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use int or const int. No need to use reference for primitive types in C++.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use the int for in_stype. I cannot use it for out_stype though because I need a reference to modify out_attrs. This breaks otherwise.

@piiswrong piiswrong merged commit 46ec178 into apache:master Oct 13, 2017
cjolivier01 pushed a commit to cjolivier01/mxnet that referenced this pull request Oct 13, 2017
* Add Infer storage for sparse slice operator

* Remove unused files

* Indentation fix and add gpu test for fallback

* Change sum builtin to py_sum

* Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

* Documentation changes for sparse

* Add fallback unittest for keepdims and exclude

* PR review based changes
:

* Fix CHECK_NE

* Change in_stype to int

* Using const int instead of int

* Initialize mid with the start
piiswrong pushed a commit that referenced this pull request Oct 14, 2017
#8232)

* GPROF update, also include include/mxnet/*.h as sources for CLionwq

* Added FindGperftools.cmake

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* USE_GPERFTOOLS off by default for now

* Add Apache license to FindGperftools.cmake

* Update CMakeLists.txt

Try to use GPerftools or JEmalloc by default

* Update CMakeLists.txt

Off by default for now

* internal labeling

* gperftools and jemalloc

* gperftools and jemalloc on by default

* Fixing the Caught error (#8199)

* Temporarily disable some unit tests to fix the build (#8253)

* Temporarily disable the following unit tests that have been causing build failures:

test_rms:
This can be re-enabled once #8230 is fixed.

test_autograd_save_memory:
This can be re-enabled once #8211 is fixed.

* OMP num threads 0->1

* remove check

* Update documentation links to point to mxnet.incubator.apache.org

Update documentation links to point to mxnet.incubator.apache.org

* add export to gluon (#8212)

* add export

* fix

* add test

* fix nnvm

* fix

* ReleaseFeedback: License Files  (#8247)

* Updating license Headers

* License changes

* Sequential aug (#8243)

* add sequentialAug

* add type for castaug

* modify docs

* Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854)

* Basic CPU Kernel OMP selection based upon whether GPU has been used

* lint

* Disabling the test_CSVIter for now (#7829)

* Disabling the test_CSVIter for now

This test causing random failure while running on windows.
Disabling it for now till we fix it. An git hub issue has
been created to track it.

* Update test_io.py

* Update test_io.py

* Use OMP thread count as test in Kernel, set count for Kernel loop

* lint

* removed

* Remove assert

* Adjust DefaultOMPThreadsPerWorker

* remove -1 from omp_cores

* Trigger build

* It is not clear why pylint claims that this is re-imported. It is not.  This is not changed from master branch.
Trying a different format.

* lint

* lint

* Change getter/setter naming style

* allow env override

* check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2").

* CR comments

* Squashed commit of the following:

commit ec704f1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:29:25 2017 -0700

    Fix formatting

commit 0218c49
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:21:48 2017 -0700

    Splitting unary ops

commit 9abbba1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 11:38:04 2017 -0700

    split unary

* Update mxnet_predict0.cc

* Update mxnet_predict0.cc

* fix oversight with bracket

* Binary scatter working on CPU and GPU

* return unchanged

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* inconsequential cleanup

* Update test_kvstore.py

* Update CMakeLists.txt

* Update CMakeLists.txt

trigger build

* force fail

* remove forced error

* test clean every make

* Test

* Copy Jenkinsfile from upstream/master to fix the build.

* logic was reversed

* Update threaded_engine.h

Trigger build

* Trigger rebuild

* Trigger build

* Trigger build

* Multiplatform docker based builds (#7792)

* Add dockerized multi-architecture build files

* Add android arm64 build

* Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174)

* Add Infer storage for sparse slice operator

* Remove unused files

* Indentation fix and add gpu test for fallback

* Change sum builtin to py_sum

* Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

* Documentation changes for sparse

* Add fallback unittest for keepdims and exclude

* PR review based changes
:

* Fix CHECK_NE

* Change in_stype to int

* Using const int instead of int

* Initialize mid with the start

* Generalizing

* OMP num threads 0->1

* remove check
piiswrong pushed a commit to piiswrong/mxnet that referenced this pull request Oct 14, 2017
apache#8232)

* GPROF update, also include include/mxnet/*.h as sources for CLionwq

* Added FindGperftools.cmake

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* USE_GPERFTOOLS off by default for now

* Add Apache license to FindGperftools.cmake

* Update CMakeLists.txt

Try to use GPerftools or JEmalloc by default

* Update CMakeLists.txt

Off by default for now

* internal labeling

* gperftools and jemalloc

* gperftools and jemalloc on by default

* Fixing the Caught error (apache#8199)

* Temporarily disable some unit tests to fix the build (apache#8253)

* Temporarily disable the following unit tests that have been causing build failures:

test_rms:
This can be re-enabled once apache#8230 is fixed.

test_autograd_save_memory:
This can be re-enabled once apache#8211 is fixed.

* OMP num threads 0->1

* remove check

* Update documentation links to point to mxnet.incubator.apache.org

Update documentation links to point to mxnet.incubator.apache.org

* add export to gluon (apache#8212)

* add export

* fix

* add test

* fix nnvm

* fix

* ReleaseFeedback: License Files  (apache#8247)

* Updating license Headers

* License changes

* Sequential aug (apache#8243)

* add sequentialAug

* add type for castaug

* modify docs

* Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854)

* Basic CPU Kernel OMP selection based upon whether GPU has been used

* lint

* Disabling the test_CSVIter for now (apache#7829)

* Disabling the test_CSVIter for now

This test causing random failure while running on windows.
Disabling it for now till we fix it. An git hub issue has
been created to track it.

* Update test_io.py

* Update test_io.py

* Use OMP thread count as test in Kernel, set count for Kernel loop

* lint

* removed

* Remove assert

* Adjust DefaultOMPThreadsPerWorker

* remove -1 from omp_cores

* Trigger build

* It is not clear why pylint claims that this is re-imported. It is not.  This is not changed from master branch.
Trying a different format.

* lint

* lint

* Change getter/setter naming style

* allow env override

* check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2").

* CR comments

* Squashed commit of the following:

commit ec704f1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:29:25 2017 -0700

    Fix formatting

commit 0218c49
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:21:48 2017 -0700

    Splitting unary ops

commit 9abbba1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 11:38:04 2017 -0700

    split unary

* Update mxnet_predict0.cc

* Update mxnet_predict0.cc

* fix oversight with bracket

* Binary scatter working on CPU and GPU

* return unchanged

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* inconsequential cleanup

* Update test_kvstore.py

* Update CMakeLists.txt

* Update CMakeLists.txt

trigger build

* force fail

* remove forced error

* test clean every make

* Test

* Copy Jenkinsfile from upstream/master to fix the build.

* logic was reversed

* Update threaded_engine.h

Trigger build

* Trigger rebuild

* Trigger build

* Trigger build

* Multiplatform docker based builds (apache#7792)

* Add dockerized multi-architecture build files

* Add android arm64 build

* Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174)

* Add Infer storage for sparse slice operator

* Remove unused files

* Indentation fix and add gpu test for fallback

* Change sum builtin to py_sum

* Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

* Documentation changes for sparse

* Add fallback unittest for keepdims and exclude

* PR review based changes
:

* Fix CHECK_NE

* Change in_stype to int

* Using const int instead of int

* Initialize mid with the start

* Generalizing

* OMP num threads 0->1

* remove check
piiswrong pushed a commit that referenced this pull request Oct 14, 2017
#8232)

* GPROF update, also include include/mxnet/*.h as sources for CLionwq

* Added FindGperftools.cmake

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* USE_GPERFTOOLS off by default for now

* Add Apache license to FindGperftools.cmake

* Update CMakeLists.txt

Try to use GPerftools or JEmalloc by default

* Update CMakeLists.txt

Off by default for now

* internal labeling

* gperftools and jemalloc

* gperftools and jemalloc on by default

* Fixing the Caught error (#8199)

* Temporarily disable some unit tests to fix the build (#8253)

* Temporarily disable the following unit tests that have been causing build failures:

test_rms:
This can be re-enabled once #8230 is fixed.

test_autograd_save_memory:
This can be re-enabled once #8211 is fixed.

* OMP num threads 0->1

* remove check

* Update documentation links to point to mxnet.incubator.apache.org

Update documentation links to point to mxnet.incubator.apache.org

* add export to gluon (#8212)

* add export

* fix

* add test

* fix nnvm

* fix

* ReleaseFeedback: License Files  (#8247)

* Updating license Headers

* License changes

* Sequential aug (#8243)

* add sequentialAug

* add type for castaug

* modify docs

* Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854)

* Basic CPU Kernel OMP selection based upon whether GPU has been used

* lint

* Disabling the test_CSVIter for now (#7829)

* Disabling the test_CSVIter for now

This test causing random failure while running on windows.
Disabling it for now till we fix it. An git hub issue has
been created to track it.

* Update test_io.py

* Update test_io.py

* Use OMP thread count as test in Kernel, set count for Kernel loop

* lint

* removed

* Remove assert

* Adjust DefaultOMPThreadsPerWorker

* remove -1 from omp_cores

* Trigger build

* It is not clear why pylint claims that this is re-imported. It is not.  This is not changed from master branch.
Trying a different format.

* lint

* lint

* Change getter/setter naming style

* allow env override

* check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2").

* CR comments

* Squashed commit of the following:

commit ec704f1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:29:25 2017 -0700

    Fix formatting

commit 0218c49
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:21:48 2017 -0700

    Splitting unary ops

commit 9abbba1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 11:38:04 2017 -0700

    split unary

* Update mxnet_predict0.cc

* Update mxnet_predict0.cc

* fix oversight with bracket

* Binary scatter working on CPU and GPU

* return unchanged

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* inconsequential cleanup

* Update test_kvstore.py

* Update CMakeLists.txt

* Update CMakeLists.txt

trigger build

* force fail

* remove forced error

* test clean every make

* Test

* Copy Jenkinsfile from upstream/master to fix the build.

* logic was reversed

* Update threaded_engine.h

Trigger build

* Trigger rebuild

* Trigger build

* Trigger build

* Multiplatform docker based builds (#7792)

* Add dockerized multi-architecture build files

* Add android arm64 build

* Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174)

* Add Infer storage for sparse slice operator

* Remove unused files

* Indentation fix and add gpu test for fallback

* Change sum builtin to py_sum

* Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

* Documentation changes for sparse

* Add fallback unittest for keepdims and exclude

* PR review based changes
:

* Fix CHECK_NE

* Change in_stype to int

* Using const int instead of int

* Initialize mid with the start

* Generalizing

* OMP num threads 0->1

* remove check
crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this pull request Oct 26, 2017
* Add Infer storage for sparse slice operator

* Remove unused files

* Indentation fix and add gpu test for fallback

* Change sum builtin to py_sum

* Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

* Documentation changes for sparse

* Add fallback unittest for keepdims and exclude

* PR review based changes
:

* Fix CHECK_NE

* Change in_stype to int

* Using const int instead of int

* Initialize mid with the start
crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this pull request Oct 26, 2017
apache#8232)

* GPROF update, also include include/mxnet/*.h as sources for CLionwq

* Added FindGperftools.cmake

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* Add option USE_GPERFTOOLS

* USE_GPERFTOOLS off by default for now

* Add Apache license to FindGperftools.cmake

* Update CMakeLists.txt

Try to use GPerftools or JEmalloc by default

* Update CMakeLists.txt

Off by default for now

* internal labeling

* gperftools and jemalloc

* gperftools and jemalloc on by default

* Fixing the Caught error (apache#8199)

* Temporarily disable some unit tests to fix the build (apache#8253)

* Temporarily disable the following unit tests that have been causing build failures:

test_rms:
This can be re-enabled once apache#8230 is fixed.

test_autograd_save_memory:
This can be re-enabled once apache#8211 is fixed.

* OMP num threads 0->1

* remove check

* Update documentation links to point to mxnet.incubator.apache.org

Update documentation links to point to mxnet.incubator.apache.org

* add export to gluon (apache#8212)

* add export

* fix

* add test

* fix nnvm

* fix

* ReleaseFeedback: License Files  (apache#8247)

* Updating license Headers

* License changes

* Sequential aug (apache#8243)

* add sequentialAug

* add type for castaug

* modify docs

* Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854)

* Basic CPU Kernel OMP selection based upon whether GPU has been used

* lint

* Disabling the test_CSVIter for now (apache#7829)

* Disabling the test_CSVIter for now

This test causing random failure while running on windows.
Disabling it for now till we fix it. An git hub issue has
been created to track it.

* Update test_io.py

* Update test_io.py

* Use OMP thread count as test in Kernel, set count for Kernel loop

* lint

* removed

* Remove assert

* Adjust DefaultOMPThreadsPerWorker

* remove -1 from omp_cores

* Trigger build

* It is not clear why pylint claims that this is re-imported. It is not.  This is not changed from master branch.
Trying a different format.

* lint

* lint

* Change getter/setter naming style

* allow env override

* check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2").

* CR comments

* Squashed commit of the following:

commit ec704f1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:29:25 2017 -0700

    Fix formatting

commit 0218c49
Author: Olivier <[email protected]>
Date:   Mon Sep 25 12:21:48 2017 -0700

    Splitting unary ops

commit 9abbba1
Author: Olivier <[email protected]>
Date:   Mon Sep 25 11:38:04 2017 -0700

    split unary

* Update mxnet_predict0.cc

* Update mxnet_predict0.cc

* fix oversight with bracket

* Binary scatter working on CPU and GPU

* return unchanged

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* inconsequential cleanup

* Update test_kvstore.py

* Update CMakeLists.txt

* Update CMakeLists.txt

trigger build

* force fail

* remove forced error

* test clean every make

* Test

* Copy Jenkinsfile from upstream/master to fix the build.

* logic was reversed

* Update threaded_engine.h

Trigger build

* Trigger rebuild

* Trigger build

* Trigger build

* Multiplatform docker based builds (apache#7792)

* Add dockerized multi-architecture build files

* Add android arm64 build

* Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174)

* Add Infer storage for sparse slice operator

* Remove unused files

* Indentation fix and add gpu test for fallback

* Change sum builtin to py_sum

* Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator

* Documentation changes for sparse

* Add fallback unittest for keepdims and exclude

* PR review based changes
:

* Fix CHECK_NE

* Change in_stype to int

* Using const int instead of int

* Initialize mid with the start

* Generalizing

* OMP num threads 0->1

* remove check
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants