You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We use Dask + UCX + NCCL to run multi-GPU analytics (Dask for launching processes, UCX for P2P, and NCCL for collectives). UCX endpoints are set-up during Dask initialization.
We are currently relying on Python frontend to test multi-GPU analytics.
In C++ only testing, Dask (launching processes) and UCX (P2P) are not available. We can use MPI for this purpose in C++ testing.
Describe the solution you'd like
cuML folks have solved this problem, and we can generally follow their solution.
RAFT comms can be configured to work with different backends.
…t using it (#1361)
Added initial infrastructure for MG C++ testing and a Pagerank MG test using it.
<s>Still a WIP, need to:</s>
* <s>Shuffle step is currently failing</s>
* <s>`graph_t` ctor expensive check is failing</s>
* <s>Finish comparison code to reference SG Pagerank results</s>
* <s>Fix the `#include` guard hack in `test_utilities.hpp`</s>
* <s>Lots of cleanup</s>
* <s>Refactor common steps into proper `SetUp()` and `TearDown()` functions</s>
closes#1136
Authors:
- Rick Ratzel (@rlratzel)
- Seunghwa Kang (@seunghwak)
Approvers:
- Brad Rees (@BradReesWork)
- Andrei Schaffer (@aschaffer)
- Chuck Hastings (@ChuckHastings)
URL: #1361
Is your feature request related to a problem? Please describe.
We use Dask + UCX + NCCL to run multi-GPU analytics (Dask for launching processes, UCX for P2P, and NCCL for collectives). UCX endpoints are set-up during Dask initialization.
We are currently relying on Python frontend to test multi-GPU analytics.
In C++ only testing, Dask (launching processes) and UCX (P2P) are not available. We can use MPI for this purpose in C++ testing.
Describe the solution you'd like
cuML folks have solved this problem, and we can generally follow their solution.
RAFT comms can be configured to work with different backends.
MPI based RAFT comms implementation is now available (https://github.com/rapidsai/raft/pull/63/files); RAFT comms (in python) uses UCX + NCCL in backend. This MPI based RAFT comms uses MPI + NCCL in backend.
For C++ only testing, we can create a raft::comms::mpi_comms object (instead of a raft::comms::std_comms object)
raft::comms::std_comms
https://github.com/rapidsai/raft/blob/branch-0.16/cpp/include/raft/comms/std_comms.hpp
raft::comms::mpi_comms
https://github.com/rapidsai/raft/blob/branch-0.16/cpp/include/raft/comms/mpi_comms.hpp
test/CMakeLists.txt need to be properly updated, and we need an example for multi-GPU C++ testing.
We can reference the cuML repo to implement ths.
e.g.
https://github.com/rapidsai/cuml/blob/branch-0.16/cpp/test/CMakeLists.txt#L78
The text was updated successfully, but these errors were encountered: