-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cugraph v0.16 #1228
[RELEASE] cugraph v0.16 #1228
Conversation
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
* added cpp packages to dev env * changelog
* Update CMakeLists.txt * Update CHANGELOG.md
* FIX Fix notebook error handlinig * DOC Changelog update
…types (#1178) * Minor update to comment to describe array sizes. * Changed graph container to use smart pointers, added arg for instantiating legacy types and switch statements for it to factory function. * Added PR 1152 to CHANGELOG.md * Removing unnecessary .get() call on unique_ptr instance * Using make_unique() instead of new * Updated to call drop() correctly after cudf API update. * Added args to support calling get_vertex_identifiers(). * Style fixes, removed commented out code meant for a future change. * Updated comment with description of new 'identifiers' arg. * Safety commit, still WIP, does not compile - updates for 2D graph support and upcoming 2D shuffle support * safety commit, does not pass tests: updated enough to be able to run the MG Louvain test. * Updated call_louvain() to use the new graph_t types. Still WIP, needs louvain updates to compile. * WIP: updates for incorporating new 2D shuffle data, still does not pass test. * Adding updates from iroy30 for calling shuffle from louvain.py * Updated to extract and pass the partition_t info and call the graph_t ctor. Now having a problem finding the right subcommunicator. * Updates to set up subcomms - having a problem with something needed by subcomms not being initialized: "address not mapped to object at address (nil)" * Added p2p flag to comms initialize() to enable initialization of UCX endpoints needed for MG test. * some proposed cleanup * safety commit: committing with debug prints to allow other team members to debug in parallel. * new technique for factory * safety commit: more updates to address problems instantiating graph_t (using num edges for partition instead of global for edgelist) and for debugging (print statments). * Changing how row and col rank are obtained, added debug prints for edge lists info * Fixes to partition_t get_matrix_partition_major/minor methods based on feedback. * Update shuffle.py * Integrating changes from iroy30 to produce "option 1" shuffle output by default, with an option to enable "option 2", temporarily enabled graph expensive checks for debugging. * Addressed review feedback: made var names consistent, fixed weights=None bug in cython code, added copyright to shuffle.py, changed how ranks are retrieved from the raft handle. * Removed debug prints. * Added PR 1163 to CHANGELOG.md * Removed extra newlines accidentally added to clean up diff in the PR, updated comment in cython code. * Added specific newlines back so file does not differ unnecessarily. * Disabled graph_t expensive check that was left enabled for debugging. * Added code path in call_louvain to support legacy graph types, to be removed when migration to graph_t types is complete. * Updates based on feedback from PR 1163: code cleanup/removed unused union members, consolidated legacy enum types, updated comments, initial support added for 64-bit vertex types (untested) * plumbed bool set based on running renumbering to set sorted_by_degree flag in graph container. * Added PR 1178 to CHANGELOG.md, C++ style fixes. * Addressed PR review feedback: added support for proper edge_t in cython wrapper and removed unnecessary vertex_t/edge_t int64,int32 combinations. Co-authored-by: Rick Ratzel <[email protected]> Co-authored-by: Chuck Hastings <[email protected]> Co-authored-by: Iroy30 <[email protected]>
* instlling raft headers under cugraph
* cuhornet tag * Update CHANGELOG.md
* fix notebooks for recent cudf changes * updated docs * changelog * fix notebooks for recent cudf changes * updated docs * changelog * reset * flake8 * fixed typo in function name * fixed typo * clean. this notebook can take 12+ hours to run. making data set small for nightly testing Co-authored-by: BradReesWork <[email protected]>
…ions (#1196) * WIP: initial commit, incomplete * Initial version that builds using a new cythin C++ util to initi subcomms. Still have an issue with error stating comms not initialized when trying to init subcomms. * WIP update to move subcomm init to user-facing comms init call. * move subcomm init to comms init * updates * Removing FIXMEs and old code now that subcomms init is moved to comms init. * Added PR 1196 to CHANGELOG.md, removed additional obsolete FIXME * Addressed FIXME by removing redundant prows and pcols args to populate_graph_container() call and using values obtained directly from handle instead. * C++ style check updates * Updated docs and comments based on review feedback. Minor consolidation of get_n_workers() call, but still need FIXME for that addressed. * flake8 updates Co-authored-by: Rick Ratzel <[email protected]> Co-authored-by: Ishika Roy <[email protected]>
* Update build.sh * Update CHANGELOG.md * Update meta.yaml * Update cugraph_dev_cuda10.1.yml * Update cugraph_dev_cuda10.2.yml * Update cugraph_dev_cuda11.0.yml * Update meta.yaml
* Remove deprecated call to gpu matrix * Update CHANGELOG
* Adding CUDA architecture code for aarch64 and adding structure to reflect cudf and cuml CMake files * FAISS was not needed for cugraph, bad copy and paste :( * updating changelog * Adding code=compute_ to CMAKE_CUDA_FLAGS * Adding GENCODE 80 to GUNROCK commands in CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt Giving a try to Gunrock's CUDA_AUTODETECT_GENCODE feature. * Update CMakeLists.txt conditionally select gunrock gencodes * Update utils.py * Update katz_centrality_test.cu Co-authored-by: Alex Fender <[email protected]>
* pagerank 2D cython/python infrastructure * sgpu pagerank edits * edits * add namespace * pull branch0.16 * update test * review updates * clang * updatelocal_verts * review changes * review changes * Update mg_pagerank_wrapper.pyx * Update mg_pagerank_wrapper.pyx * Update pagerank.py * Update mg_pagerank_wrapper.pyx * rename edge_attr * Add renaming of edge_attr * Update CMakeLists.txt * flake8 * Update graph.py * update graph.py to rename edge_attr Co-authored-by: Alex Fender <[email protected]>
* add minimal update to create a PR * pagerank 2D cython/python infrastructure * 2D infra- bfs and sssp * add a work around for source (or destination) == self case for isend/irecv * fix a warning * remove dummy change log * sgpu pagerank edits * edits * add namespace * pull branch0.16 * update test * in copy_v_transform_reduce_in|out_nbr, implement missing communication alnog the minor direction * bug fix (assertion failure)\n * bug fix in copy_v_transform_reduce_in_out_nbr.cuh * clang-format * enforce consistency in variable naming related to subcommunicators * bug fix (graph construction) * bug fix (vertex_partition_segment_offsets) * review updates * clang * bug fix (caching comm_rank in partition_t object) * updatelocal_verts * bug fix (scale dangling_sum by damping factor) * remove transform_reduce_v_with_adj_matrix_row * replace device_vector with device_uvector in sssp * bfs updates to 2D infra * sssp 2D integration * sssp * flake8 * clang * add host_scalalr_bcast to comm_utils * remove unnecessary include * bug fix in update_frontier_v_push_if_out_nbr * bug fix in VertexFrontier declaration * add debug print for pagerank sum * remove dummy code * bug fix in assert * fix timing bug with isend/irecv * fix compile error * review updates * review updates * Revert "fix compile error" This reverts commit 900fd11. * Revert "fix timing bug with isend/irecv" This reverts commit e0e696a. * Revert "bug fix in assert" This reverts commit 97b98ed. * Revert "remove dummy code" This reverts commit facc70c. * Revert "add debug print for pagerank sum" This reverts commit c479b6d. * Revert "bug fix in VertexFrontier declaration" This reverts commit 44e3e10. * Revert "bug fix in update_frontier_v_push_if_out_nbr" This reverts commit dd80001. * Revert "remove unnecessary include" This reverts commit c55dbfb. * Revert "add host_scalalr_bcast to comm_utils" This reverts commit 6430ad5. * Revert "replace device_vector with device_uvector in sssp" This reverts commit d6b2e58. * Revert "remove transform_reduce_v_with_adj_matrix_row" This reverts commit 21d4e10. * Revert "bug fix (scale dangling_sum by damping factor)" This reverts commit 15818f7. * Revert "bug fix (caching comm_rank in partition_t object)" This reverts commit bd2dd83. * Revert "bug fix (vertex_partition_segment_offsets)" This reverts commit a006b99. * Revert "bug fix (graph construction)" This reverts commit 59fadef. * Revert "enforce consistency in variable naming related to subcommunicators" This reverts commit 790549f. * Revert "clang-format" This reverts commit 761f7aa. * Revert "bug fix in copy_v_transform_reduce_in_out_nbr.cuh" This reverts commit f874f65. * Revert "bug fix (assertion failure)\n" This reverts commit a33c2d1. * Revert "in copy_v_transform_reduce_in|out_nbr, implement missing communication alnog the minor direction" This reverts commit 6e1b152. * Revert "fix a warning" This reverts commit 25607ca. * Revert "add a work around for source (or destination) == self case for isend/irecv" This reverts commit 2be9e5f. * revert * clang * update tests and predecessor * update doc * update edge weights * remove partition row/col size * remove partition row/col size * remove partition row/col size * remove partition row/col size Co-authored-by: Seunghwa Kang <[email protected]>
…SSP (#1174) * add minimal update to create a PR * pagerank 2D cython/python infrastructure * 2D infra- bfs and sssp * add a work around for source (or destination) == self case for isend/irecv * fix a warning * remove dummy change log * sgpu pagerank edits * edits * add namespace * pull branch0.16 * update test * in copy_v_transform_reduce_in|out_nbr, implement missing communication alnog the minor direction * bug fix (assertion failure)\n * bug fix in copy_v_transform_reduce_in_out_nbr.cuh * clang-format * enforce consistency in variable naming related to subcommunicators * bug fix (graph construction) * bug fix (vertex_partition_segment_offsets) * review updates * clang * bug fix (caching comm_rank in partition_t object) * updatelocal_verts * bug fix (scale dangling_sum by damping factor) * remove transform_reduce_v_with_adj_matrix_row * replace device_vector with device_uvector in sssp * bfs updates to 2D infra * sssp 2D integration * sssp * flake8 * clang * add host_scalalr_bcast to comm_utils * remove unnecessary include * review changes * review changes * bug fix in update_frontier_v_push_if_out_nbr * bug fix in VertexFrontier declaration * add debug print for pagerank sum * remove dummy code * bug fix in assert * fix timing bug with isend/irecv * fix compile error * fix debug compile error * add missing cudaStreamSynchronize * guard raft::grid_1d_thread_t * compile error fix * SG bug fix (calling get_rank() on uninitialized comms) * BFS bug fix * fix a PageRank bug * pattern accelerator bug fix (found testing SSSP) * Update mg_pagerank_wrapper.pyx * review updates * bug fix in BFS communication * review updates * Revert "fix compile error" This reverts commit 900fd11. * Revert "fix timing bug with isend/irecv" This reverts commit e0e696a. * Revert "bug fix in assert" This reverts commit 97b98ed. * Revert "remove dummy code" This reverts commit facc70c. * Revert "add debug print for pagerank sum" This reverts commit c479b6d. * Revert "bug fix in VertexFrontier declaration" This reverts commit 44e3e10. * Revert "bug fix in update_frontier_v_push_if_out_nbr" This reverts commit dd80001. * Revert "remove unnecessary include" This reverts commit c55dbfb. * Revert "add host_scalalr_bcast to comm_utils" This reverts commit 6430ad5. * Revert "replace device_vector with device_uvector in sssp" This reverts commit d6b2e58. * Revert "remove transform_reduce_v_with_adj_matrix_row" This reverts commit 21d4e10. * Revert "bug fix (scale dangling_sum by damping factor)" This reverts commit 15818f7. * Revert "bug fix (caching comm_rank in partition_t object)" This reverts commit bd2dd83. * Revert "bug fix (vertex_partition_segment_offsets)" This reverts commit a006b99. * Revert "bug fix (graph construction)" This reverts commit 59fadef. * Revert "enforce consistency in variable naming related to subcommunicators" This reverts commit 790549f. * Revert "clang-format" This reverts commit 761f7aa. * Revert "bug fix in copy_v_transform_reduce_in_out_nbr.cuh" This reverts commit f874f65. * Revert "bug fix (assertion failure)\n" This reverts commit a33c2d1. * Revert "in copy_v_transform_reduce_in|out_nbr, implement missing communication alnog the minor direction" This reverts commit 6e1b152. * Revert "fix a warning" This reverts commit 25607ca. * Revert "add a work around for source (or destination) == self case for isend/irecv" This reverts commit 2be9e5f. * revert * clang * update tests and predecessor * Update mg_pagerank_wrapper.pyx * fix the mess-up in merging with unmerged PRs * transitioning from UCX send/recv to NCCL send/recv * remove temporary code * replace UCX backend wiht NCCL backend for GPU memory P2P in update_frontier_v_push_if_out_nbr * bug fix for potential hang * remove debug prints * fix a new bug introduced in sssp bug fix * Update pagerank.py * Update mg_pagerank_wrapper.pyx * update doc * update edge weights * rename edge_attr * Add renaming of edge_attr * Update CMakeLists.txt * flake8 * Update graph.py * update graph.py to rename edge_attr * bug fix handling edge weights * update change log * fixed outdated comments * clang-format * remove debug statement * fix comments & add cosmetic updates * fix a simple mistake in cosmetic updates Co-authored-by: Ishika Roy <[email protected]> Co-authored-by: Iroy30 <[email protected]> Co-authored-by: Alex Fender <[email protected]>
* update to use collective_utils.cuh * Updated to call drop() correctly after cudf API update. * Added args to support calling get_vertex_identifiers(). * Style fixes, removed commented out code meant for a future change. * Updated comment with description of new 'identifiers' arg. * MNMG Louvain, with debug code, working on SG * add additional functions to graph_view * fix algorithm API to support old and new graphs * fixed confusing variable names * reorder functions to be consistent with elsewhere * fix a compiler error * accomodate the change of raft allgahterv's input parameter displs type from int[] to size_t[] * update change log * update change log * update RAFT tag * Safety commit, still WIP, does not compile - updates for 2D graph support and upcoming 2D shuffle support * safety commit, does not pass tests: updated enough to be able to run the MG Louvain test. * temporary commit of copy_to_adj_matrix_row.cuh and collective_utils.cuh to checkout another branch * Updated call_louvain() to use the new graph_t types. Still WIP, needs louvain updates to compile. * fix a bug in matrix partitioning ranges * Merged lastest branch, got things compiling * temporary commit * fix errors in previous merge conflicts * extend copy_to_adj_matrix_row.cuh for MNMG * rename collective_utils.cuh to comm_utils.cuh * rename copy_v_transform_reduce_nbr.cuh to copy_v_transform_reduce_in_out_nbr.cuh * merge copy_to_adj_matrix_row.cuh & copy_to_adj_matrix_col.cuh * extend copy_v_transform_reduce_(in|out)_nbr for MNMG * extend Bucket for MNMG * add get_vertex_partition_size * add more explicit instantiation cases for BFS, SSSP, PageRank, KatzCentrality * WIP: updates for incorporating new 2D shuffle data, still does not pass test. * some code cleanup in preparation for MNMG testing * Adding updates from iroy30 for calling shuffle from louvain.py * extend update_frontier_v_push_if_out_nbr.cuh for MNMG * wrap debugging calls in ifdef DEBUG * delete spurious comment * Updated to extract and pass the partition_t info and call the graph_t ctor. Now having a problem finding the right subcommunicator. * Updates to set up subcomms - having a problem with something needed by subcomms not being initialized: "address not mapped to object at address (nil)" * Added p2p flag to comms initialize() to enable initialization of UCX endpoints needed for MG test. * code refinement * refactor copy_v_transform_reduce_in|out_nbr * bug fix (thanks Rick) * clang-format * bug fix * safety commit: committing with debug prints to allow other team members to debug in parallel. * clean up a few things * safety commit: more updates to address problems instantiating graph_t (using num edges for partition instead of global for edgelist) and for debugging (print statments). * Changing how row and col rank are obtained, added debug prints for edge lists info * Fixes to partition_t get_matrix_partition_major/minor methods based on feedback. * bug fixes * bug fix * latest updates * fix to get latest pattern accelerators to work correctly * Update shuffle.py * Integrating changes from iroy30 to produce "option 1" shuffle output by default, with an option to enable "option 2", temporarily enabled graph expensive checks for debugging. * add minimal update to create a PR * pagerank 2D cython/python infrastructure * 2D infra- bfs and sssp * debugging * add a work around for source (or destination) == self case for isend/irecv * fix a warning * remove dummy change log * more debugging * debugging * sgpu pagerank edits * more louvain debugging * edits * debugging * add namespace * debugging * pull branch0.16 * update test * in copy_v_transform_reduce_in|out_nbr, implement missing communication alnog the minor direction * debugging * bug fix (assertion failure)\n * fix merge issues * bug fix in copy_v_transform_reduce_in_out_nbr.cuh * clang-format * fix bug in cython graph creation * debugging * enforce consistency in variable naming related to subcommunicators * bug fix (graph construction) * bug fix (vertex_partition_segment_offsets) * review updates * clang * bug fix (caching comm_rank in partition_t object) * updatelocal_verts * debugging comms * bug fix (scale dangling_sum by damping factor) * remove transform_reduce_v_with_adj_matrix_row * replace device_vector with device_uvector in sssp * bfs updates to 2D infra * sssp 2D integration * sssp * flake8 * clang * add host_scalalr_bcast to comm_utils * remove unnecessary include * review changes * review changes * bug fix in update_frontier_v_push_if_out_nbr * bug fix in VertexFrontier declaration * add debug print for pagerank sum * remove dummy code * bug fix in assert * fix timing bug with isend/irecv * fix compile error * fix debug compile error * add missing cudaStreamSynchronize * guard raft::grid_1d_thread_t * compile error fix * SG bug fix (calling get_rank() on uninitialized comms) * update versions of raft and cuco * latest debugging * BFS bug fix * fix a PageRank bug * pattern accelerator bug fix (found testing SSSP) * more debugging * Update mg_pagerank_wrapper.pyx * review updates * bug fix in BFS communication * review updates * Revert "fix compile error" This reverts commit 900fd11. * Revert "fix timing bug with isend/irecv" This reverts commit e0e696a. * Revert "bug fix in assert" This reverts commit 97b98ed. * Revert "remove dummy code" This reverts commit facc70c. * Revert "add debug print for pagerank sum" This reverts commit c479b6d. * Revert "bug fix in VertexFrontier declaration" This reverts commit 44e3e10. * Revert "bug fix in update_frontier_v_push_if_out_nbr" This reverts commit dd80001. * Revert "remove unnecessary include" This reverts commit c55dbfb. * Revert "add host_scalalr_bcast to comm_utils" This reverts commit 6430ad5. * Revert "replace device_vector with device_uvector in sssp" This reverts commit d6b2e58. * Revert "remove transform_reduce_v_with_adj_matrix_row" This reverts commit 21d4e10. * Revert "bug fix (scale dangling_sum by damping factor)" This reverts commit 15818f7. * Revert "bug fix (caching comm_rank in partition_t object)" This reverts commit bd2dd83. * Revert "bug fix (vertex_partition_segment_offsets)" This reverts commit a006b99. * Revert "bug fix (graph construction)" This reverts commit 59fadef. * Revert "enforce consistency in variable naming related to subcommunicators" This reverts commit 790549f. * Revert "clang-format" This reverts commit 761f7aa. * Revert "bug fix in copy_v_transform_reduce_in_out_nbr.cuh" This reverts commit f874f65. * Revert "bug fix (assertion failure)\n" This reverts commit a33c2d1. * Revert "in copy_v_transform_reduce_in|out_nbr, implement missing communication alnog the minor direction" This reverts commit 6e1b152. * Revert "fix a warning" This reverts commit 25607ca. * Revert "add a work around for source (or destination) == self case for isend/irecv" This reverts commit 2be9e5f. * revert * clang * update tests and predecessor * working MNMG Louvain on Karate and Dolphin with 2 GPUs * turn off debugging * clean up some output * support compiling on systems without libcu++ * Update mg_pagerank_wrapper.pyx * debugging 2 x N case * use default * use default if prows not specified * disable check for libcu++, not working * update changelog * fix some unit testing * rename shuffle2 back to shuffle, debug some unit test stuff * fix clang issues * update from Ishika/Rick * somehow lost shuffle update * undo last merge * add some synchronization calls, turn on debugging, to tryand isolate 2 x 4 error * remove some old debugging from Rick * fix the mess-up in merging with unmerged PRs * transitioning from UCX send/recv to NCCL send/recv * get new cuco from Jake to retrieve libcu++ * Jake's technique needs to be applied to our CMakefile * debugging * update cuco version to latest * fix clang formatting * move MurmurHash again... * update raft version * manually merge branch-0.16 * shuffle no longer takes prows/pcols, pass to init instead * working version * code cleanup * remove some old print statements * fix clang-format issues * update cuco revision * revert to running karate test Co-authored-by: Seunghwa Kang <[email protected]> Co-authored-by: Rick Ratzel <[email protected]> Co-authored-by: Iroy30 <[email protected]> Co-authored-by: Ishika Roy <[email protected]> Co-authored-by: Charles Hastings <[email protected]> Co-authored-by: Charles Hastings <[email protected]> Co-authored-by: Charles Hastings <[email protected]> Co-authored-by: Charles Hastings <[email protected]>
* Added new Medium entry * updated to match cuML * copied to match cuML * updated list of MG algorithms * converted to Markdown * updated reference to new markdown file * removed rst file * should really be a HTML file (next release) * copy pdf files over * updates * removed ref to cuml * changelog * addressing review issues * migrated to RST Co-authored-by: BradReesWork <[email protected]>
* update dask docs * changelog
* Added code to ensure CUDA 10.2 or higher is used for MG Louvain. * Added PR 1222 to CHANGELOG.md * Temporarily disabling test that sporadically fails on centos7, defering investigation to 0.17 * Updating libcudacxx to tag 1.3.0 (since 1.3.0-rc0 is no longer available) Co-authored-by: Rick Ratzel <[email protected]>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
* Temporarily disabling all C++ tests for 0.16 due to intermittent failures from what appears to be an issue with Thrust (which does not appear to affect the Python API or notebooks). These will be re-enabled once this issue is resolved in 0.17. * Added PR 1233 to CHANGELOG.md Co-authored-by: Rick Ratzel <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #1228 +/- ##
=======================================
Coverage ? 56.59%
=======================================
Files ? 62
Lines ? 2564
Branches ? 0
=======================================
Hits ? 1451
Misses ? 1113
Partials ? 0 Continue to review full report at Codecov.
|
Require `ucx-proc=*=gpu`
❄️ Code freeze for
branch-0.16
and v0.16 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-0.16
until release (merging of this PR).What is the purpose of this PR?
branch-0.16
intomain
for the release