Some MTMG code cleanup and small optimizations #3894

ChuckHastings · 2023-09-27T22:09:53Z

Added some missing documentation.

A couple of optimizations:

Modified the append logic to keep the mutex lock only long enough to compute what needs to be copied and where.
Modified the handle created by the resource manager for each GPU to have a stream pool to enable different threads to operate on different streams.

seunghwak · 2023-10-18T22:04:30Z

cpp/include/cugraph/mtmg/detail/per_device_edgelist.hpp

+    std::vector<std::tuple<vertex_t*, vertex_t const*, size_t>> dst_copies;
+    std::vector<std::tuple<weight_t*, weight_t const*, size_t>> wgt_copies;
+    std::vector<std::tuple<edge_t*, edge_t const*, size_t>> edge_id_copies;
+    std::vector<std::tuple<edge_type_t*, edge_type_t const*, size_t>> edge_type_copies;


Should we maintain 5 variables or one variable storing (input_start_offset, output_start_offset, size) triplets will be sufficient?

Good suggestion, I'll look into that for next push.

seunghwak · 2023-10-18T22:11:00Z

cpp/include/cugraph/mtmg/detail/per_device_edgelist.hpp

+      while (count > 0) {
+        size_t copy_count = std::min(count, (src_.back().size() - current_pos_));
+
+        src_copies.push_back(
+          std::make_tuple(src_.back().begin() + current_pos_, src.begin() + pos, copy_count));
+        dst_copies.push_back(
+          std::make_tuple(dst_.back().begin() + current_pos_, dst.begin() + pos, copy_count));
+        if (wgt)
+          wgt_copies.push_back(
+            std::make_tuple(wgt_->back().begin() + current_pos_, wgt->begin() + pos, copy_count));
+        if (edge_id)
+          edge_id_copies.push_back(std::make_tuple(
+            edge_id_->back().begin() + current_pos_, edge_id->begin() + pos, copy_count));
+        if (edge_type)
+          edge_type_copies.push_back(std::make_tuple(
+            edge_type_->back().begin() + current_pos_, edge_type->begin() + pos, copy_count));
+
+        count -= copy_count;
+        pos += copy_count;
+        current_pos_ += copy_count;
+      }


What happens if count = 1000, src_.back().size() = 100, and current_pos_ = 0?

At the end of the first loop, copy_count = 100, count = 900, pos=100, current_pos_=100. From the second loop, copy_count=0 and this loop won't finish or am I missing something?

Shouldn't we allocate additional buffers and reset current_pos_ to 0 for this loop to finish?

Yes... not sure how I missed that, the original code had that logic, I imagine I accidentally deleted that. I'll add that back in.

seunghwak · 2023-10-18T22:19:17Z

cpp/include/cugraph/mtmg/detail/per_device_edgelist.hpp


-    handle.raft_handle().sync_stream();
+    handle.raft_handle().sync_stream(handle.get_stream());


If we add get_stream() to mtmg::handle, what about adding sync_stream to mtmg::handle as well?

seunghwak · 2023-10-18T22:23:49Z

cpp/include/cugraph/mtmg/resource_manager.hpp

@@ -153,11 +154,12 @@ class resource_manager_t {
      auto pos = local_rank_map_.find(rank);
      RAFT_CUDA_TRY(cudaSetDevice(pos->second.value()));

-      raft::handle_t tmp_handle;
-
+      size_t n_streams{16};


I needed that many for one of the tests I ran :-)

I'll make that a parameter. Any suggestion on a good default?

Maybe # of GPUs? (assuming that 1 stream per thread and # threads == # GPUs)

Each GPU will have its own pool of streams. The pool so far is used by different thread ranks copying data to the GPU independently.

I've added it as a parameter.

seunghwak · 2023-10-18T22:24:59Z

cpp/src/mtmg/vertex_result.cu

@@ -97,7 +97,7 @@ rmm::device_uvector<result_t> vertex_result_view_t<result_t>::gather(
      return vertex_partition.local_vertex_partition_offset_from_vertex_nocheck(v);
    });

-  thrust::gather(handle.raft_handle().get_thrust_policy(),
+  thrust::gather(rmm::exec_policy(handle.get_stream()),


What about adding (mtmg::)handle.get_thrust_policy()?

seunghwak

Looks good to me (besides the reason behind 4 in the code, some documentation will be helpful).

seunghwak · 2023-10-20T19:55:08Z

cpp/tests/mtmg/threaded_test.cu

@@ -107,7 +107,7 @@ class Tests_Multithreaded
    ncclGetUniqueId(&instance_manager_id);

    auto instance_manager = resource_manager.create_instance_manager(
-      resource_manager.registered_ranks(), instance_manager_id);
+      resource_manager.registered_ranks(), instance_manager_id, 4);


What is 4 here?

Made this a constant, added a comment describing why it's 4 in the latest push.

ChuckHastings · 2023-10-24T02:47:33Z

/merge

some code cleanup and optimizations

46a986a

ChuckHastings self-assigned this Sep 27, 2023

ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 27, 2023

ChuckHastings added this to the 23.12 milestone Sep 27, 2023

Merge branch 'branch-23.12' into mtmg_small_improvements

d88082e

ChuckHastings marked this pull request as ready for review October 18, 2023 21:37

ChuckHastings requested a review from a team as a code owner October 18, 2023 21:37

seunghwak reviewed Oct 18, 2023

View reviewed changes

address PR comments

235f2b6

ChuckHastings requested a review from seunghwak October 20, 2023 16:56

seunghwak approved these changes Oct 20, 2023

View reviewed changes

ChuckHastings added 2 commits October 23, 2023 16:40

add some documentation, define number of threads per GPU as a constant

f0a27a4

Merge branch 'branch-23.12' into mtmg_small_improvements

03fa6ec

rapids-bot bot merged commit 9b28458 into rapidsai:branch-23.12 Oct 24, 2023
73 checks passed

ChuckHastings deleted the mtmg_small_improvements branch December 1, 2023 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some MTMG code cleanup and small optimizations #3894

Some MTMG code cleanup and small optimizations #3894

ChuckHastings commented Sep 27, 2023

seunghwak Oct 18, 2023

ChuckHastings Oct 19, 2023

seunghwak Oct 18, 2023

ChuckHastings Oct 19, 2023

seunghwak Oct 18, 2023

seunghwak Oct 18, 2023

ChuckHastings Oct 19, 2023

seunghwak Oct 19, 2023

ChuckHastings Oct 19, 2023

ChuckHastings Oct 19, 2023

seunghwak Oct 18, 2023

seunghwak left a comment

seunghwak Oct 20, 2023

ChuckHastings Oct 23, 2023

ChuckHastings commented Oct 24, 2023


		handle.raft_handle().sync_stream();
		handle.raft_handle().sync_stream(handle.get_stream());

Some MTMG code cleanup and small optimizations #3894

Some MTMG code cleanup and small optimizations #3894

Conversation

ChuckHastings commented Sep 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuckHastings commented Oct 24, 2023