Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Non-blocking row_sparse_pull. Fix incorrect indices generated by device kvstore.row_sparse_pull #9887

Merged
merged 15 commits into from
Mar 5, 2018

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Feb 26, 2018

Description

This PR adds async execution support for kv.row_sparse_pull.
The operation was blocking because it requires unique row_ids, whose shape cannot be inferred ahead of time. This PR stores the unique row_ids in a row_sparse NDArray, whose data shape can be changed at run time when executed asynchronously.

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Removed use_copy param in BroadcastRowSparse - this is essentially the same as calling Broadcast
  • Removed CopyRetainedRowsToGPU and always use SparseRetain because CopyRetainedRowsToGPU has high invocation overhead and SparseRetain has improved performance
  • Revised test cases to test against shapes/dtypes commonly used by users
  • Tested example/sparse/linear_classification with dist_sync kvstore
  • Tested tests/nightly/dist_sync_kvstore.py
  • Tested tests/python/gpu/test_kvstore.py
  • Tested tests/python/unittest/test_kvstore.py

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

Copy link
Member

@rahul003 rahul003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just wanted to talk about if it is safe to use local variables for ndarrays. I see that at many places. The one I marked below is just one example. Is there a chance they can go out of scope, be freed, and cause segfaults?

}
CHECK_EQ(num_vals, 1) << "RowSparsePull with multiple values is not supported yet";
NDArray& indices = target_val_rowids[0].second;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to use local variable ndarray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine, they're passed by value when pushed to engine

@eric-haibin-lin
Copy link
Member Author

@reminisce could you help review?

IType *dptr = out.data().dptr<IType>();
common::ParallelSort(dptr, dptr + num_elements,
engine::OpenMP::Get()->GetRecommendedOMPThreadCount());
const size_t num_selected_out = std::unique(dptr, dptr + num_elements) - dptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unique only removes duplicates of elements present consecutively. Is that what's expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because inputs are sorted first

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -76,7 +76,7 @@ def check_row_sparse_pull(kv, count):
for i in range(count):
vals.append(mx.nd.zeros(shape).tostype('row_sparse'))
row_id = np.random.randint(num_rows, size=num_rows)
row_ids.append(mx.nd.array(row_id))
row_ids.append(mx.nd.array(row_id).reshape((2,2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reshape?
One suggestion: (2, 2) is too hard-coded. If shape is changed in the beginning, the test would fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.. Made it less hard-coded by calculating the dimension based on total number of elements.
I'm testing 2-D rowids explicitly because that seems to be a common use case, many users forget to reshape it to 1-D before passing it to kvstore. It is now supported with this PR.

@eric-haibin-lin
Copy link
Member Author

eric-haibin-lin commented Mar 5, 2018

This PR also fixed a bug in the original GPU "Unique" implementation, where the same pointer is passed as both the input/output to cub::unique. This results in incorrect output when the number of rowids is large. Fixed by storing the input/output in separate buffers.

@eric-haibin-lin
Copy link
Member Author

@ZiyueHuang please view the changes for gpu unique

@eric-haibin-lin eric-haibin-lin changed the title Non-blocking row_sparse_pull Non-blocking row_sparse_pull. Fix incorrect indices generated by device kvstore.row_sparse_pull Mar 5, 2018
@ZiyueHuang
Copy link
Member

@eric-haibin-lin Thanks for the fix!

Looks good.

@eric-haibin-lin eric-haibin-lin merged commit 02dd89a into apache:master Mar 5, 2018
CoinCheung added a commit to CoinCheung/incubator-mxnet that referenced this pull request Mar 18, 2018
… by device kvstore.row_sparse_pull (apache#9887)"

This reverts commit 02dd89a.
jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request Mar 30, 2018
…ce kvstore.row_sparse_pull (apache#9887)

* nonblocking Kvstore (apache#195)

* draft

* rm use_copy. fix dist kvstore. TODO: fix dtype

* fix dtype, shape

* remove reshape

* cleanup

* fix compilation

* rsp draft

* update param name

* doc update and small refactoring

* minor updates

* enhance test case with 2-D rowids

* update gpu tests

* rewrite gpu unique kernels

* update gpu tests

* update reshape test/

* fix lint

* update test for py3
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
…ce kvstore.row_sparse_pull (apache#9887)

* nonblocking Kvstore (apache#195)

* draft

* rm use_copy. fix dist kvstore. TODO: fix dtype

* fix dtype, shape

* remove reshape

* cleanup

* fix compilation

* rsp draft

* update param name

* doc update and small refactoring

* minor updates

* enhance test case with 2-D rowids

* update gpu tests

* rewrite gpu unique kernels

* update gpu tests

* update reshape test/

* fix lint

* update test for py3
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…ce kvstore.row_sparse_pull (apache#9887)

* nonblocking Kvstore (apache#195)

* draft

* rm use_copy. fix dist kvstore. TODO: fix dtype

* fix dtype, shape

* remove reshape

* cleanup

* fix compilation

* rsp draft

* update param name

* doc update and small refactoring

* minor updates

* enhance test case with 2-D rowids

* update gpu tests

* rewrite gpu unique kernels

* update gpu tests

* update reshape test/

* fix lint

* update test for py3
@eric-haibin-lin eric-haibin-lin deleted the kvstore-pr branch September 18, 2018 23:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants