[REVIEW] adding test graphs - part 2 #1603

ChuckHastings · 2021-05-13T23:20:30Z

This effort was originally targeted toward the WCC effort, but has been expanded a bit. This supersedes #1545 which I will close.

The goal here is to create a means for constructing test graphs in an easier fashion. Testing the capabilities of different graph algorithms might require a variety of graphs. The objective of this PR is to better organize the graph generation components and to introduce some components to help in composing graphs out of multiple components.

This PR introduces the following capabilities:

Create an ER graph
Create a collection of Complete Graphs
Create a collection of 2D mesh graphs
Create a collection of 3D mesh graphs
Create a random path graph (connect all vertices with a single randomly ordered path)
Translate vertex ids of a graph
Combine multiple edge lists into a single graph

Closes #1543

codecov-commenter · 2021-05-14T04:01:10Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@575677f). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.06    #1603   +/-   ##
===============================================
  Coverage                ?   60.03%           
===============================================
  Files                   ?       80           
  Lines                   ?     3551           
  Branches                ?        0           
===============================================
  Hits                    ?     2132           
  Misses                  ?     1419           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 575677f...6e24eaf. Read the comment docs.

aschaffer · 2021-05-14T17:07:07Z

cpp/src/generators/generator_tools.cu

+  std::vector<rmm::device_uvector<vertex_t>> srcs_v{};
+  std::vector<rmm::device_uvector<vertex_t>> dsts_v{};
+
+  srcs_v.push_back(std::move(d_src_v));


Consider using emplace_back() instead of push_back(), as it's more efficient for rvalue references (which is the case here, because of std::move()).

OBE with next push.

aschaffer · 2021-05-14T17:07:53Z

cpp/src/generators/generator_tools.cu

+    raft::copy(
+      copy_weights_v.begin(), d_weights_v.begin(), d_weights_v.size(), handle.get_stream());
+
+    weights_v.push_back(std::move(d_weights_v));


Consider using emplace_back() instead of push_back(), as it's more efficient for rvalue references (which is the case here, because of std::move()).

OBE with next push.

seunghwak · 2021-05-18T16:57:55Z

cpp/src/generators/erdos_renyi_generator.cu

+
+  rmm::device_uvector<size_t> indices_v(count, handle.get_stream());
+
+  handle.get_stream_view().synchronize();


Why do we need this?

Residual from some earlier debugging. Deleted.

seunghwak · 2021-05-18T17:00:52Z

cpp/src/generators/erdos_renyi_generator.cu

+
+  size_t count = thrust::count_if(rmm::exec_policy(handle.get_stream()),
+                                  random_iterator,
+                                  random_iterator + num_vertices * num_vertices,


Just want to note that the Gnp model can be way more expensive than the Gnm model if num_vertices is large (but much simpler to implement as we don't need to worry about duplicates).

Shouldn't we add gnp in the function name so users can expect this will go over num_vertices * num_vertices potential edges?

Changed function name. Add a gnm function as well, although not implemented in this PR.

seunghwak · 2021-05-18T17:02:43Z

cpp/src/generators/erdos_renyi_generator.cu

+                    thrust::make_zip_iterator(thrust::make_tuple(src_v.begin(), src_v.end())),
+                    [num_vertices] __device__(size_t index) {
+                      size_t src = index / num_vertices;
+                      size_t dst = index % num_vertices;


This will break if vertex_t is 64 bit.

Technically, it will break if num_vertices > (2^31 - 1). If vertex_t is 64 bits but could be stored in 32 bits it would still work.

Since we don't want this variation called if num_vertices > (2^31 - 1)... even if it worked we don't want to do that much computation... I just added a CUGRAPH_EXPECTS at the beginning of the function.

seunghwak · 2021-05-18T17:06:06Z

cpp/src/generators/generator_tools.cu

+                         rmm::device_uvector<vertex_t> &d_src_v,
+                         rmm::device_uvector<vertex_t> &d_dst_v,
+                         vertex_t vertex_id_offset,
+                         uint64_t seed)


Have you checked that this function works if num_vertices is not a power of two?

IIRC, this function works if vertices are in the range [0, 2^scale), but I am not sure this still works otherwise. Need to check.

I guess this will still work (but the scrambled vertex IDs are no longer contiguous integers but does not matter if we renumber or allow having many isolated vertices... but needs to double check).

Added a unit test for the scramble function. The result is correct (it validates properly). Your assertion is correct regarding the resulting ids no longer being contiguous. This will (as you observe) result in a collection of isolated vertices if we do not renumber; although if we renumber that will be corrected.

seunghwak · 2021-05-18T17:07:37Z

cpp/src/generators/generator_tools.cu

+template <typename T>
+void append_all(raft::handle_t const &handle,
+                std::vector<rmm::device_uvector<T>> &&input,
+                rmm::device_uvector<T> &output)


Should we better return rmm::device_uvector instead of taking rmm::device_uvector<T>& output. Unless the ouput is in-out or there is another strong reason to do so, returning is more functional and side-effects-free than taking an lvalue reference.

Done in next push

seunghwak · 2021-05-18T17:15:40Z

cpp/src/generators/generator_tools.cu

+
+  size_t number_of_edges{0};
+
+  if (optional_d_weights) {


So, should we always remove all multi-edges or better make this as an option? Graph500 input edge lists can possibly have multi-edges (and self-loops) and removing those is a task in the graph construction step.

Added a flag to control this behavior in next push.

seunghwak · 2021-05-18T17:23:21Z

cpp/src/generators/generator_tools.cu

+                      rmm::device_uvector<vertex_t> &&d_src_v,
+                      rmm::device_uvector<vertex_t> &&d_dst_v,
+                      std::optional<rmm::device_uvector<weight_t>> &&optional_d_weights_v)
+{


I think the code below is simpler and more efficient (in memory footprint & the amount of data movement).

auto offset = d_src_v.size(); d_src_v.resize(offset * 2, handle.get_stream_view()); d_dst_v.resize(d_src_v.size(), handle.get_stream_view()); thrust::copy(rmm::exec_policy(handle.get_stream_view()), d_dst_v.begin(), d_dst_v.begin() + offset, d_src_v.begin() + offset); thrust::copy(rmm::exec_policy(handle.get_stream_view()), d_src_v.begin(), d_src_v.begin() + offset, d_dst_v.begin() + offset); if (optional_d_weights_v) { optional_d_weights_v.resize(d_src_v.size(), handle.get_stream_view()); thrust::copy(rmm::exec_policy(handle.get_stream_view()), optional_d_weight_v.begin(), optinoal_d_weight_v.begin() + offset, optional_d_weight_v.begin() + offset); } return std::make_tuple(std::move(d_src_v), std::move(d_dst_v), optional_d_weights_v ? std::move(optional_d_weights_v) : std::nullopt);

Implemented this version.

seunghwak · 2021-05-18T17:27:39Z

cpp/src/generators/simple_generators.cu

+      thrust::make_counting_iterator<size_t>(0),
+      [base_vertex_id, num_vertices, invalid_vertex] __device__(size_t index) {
+        size_t graph_index = index / (num_vertices * num_vertices);
+        size_t local_index = index % (num_vertices * num_vertices);


This will break if vertex_t is 64 bit.

Added a check (like above)

BradReesWork · 2021-06-03T16:03:29Z

@gpucibot merge

ChuckHastings added 2 commits May 13, 2021 17:38

move some files in preparation for test code change

f2a9e11

rest of graph generator tools and tests

380149f

ChuckHastings requested review from a team as code owners May 13, 2021 23:20

fix clang-format issues

19380ff

ChuckHastings added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 13, 2021

ChuckHastings added 2 commits May 13, 2021 19:23

fix copyright year

548d838

update rmat to include header from new location

a5df124

ChuckHastings requested a review from a team as a code owner May 14, 2021 01:25

BradReesWork requested review from seunghwak, rlratzel and aschaffer May 14, 2021 13:40

BradReesWork added this to the 21.06 milestone May 14, 2021

BradReesWork added the merge conflict label May 14, 2021

aschaffer requested changes May 14, 2021

View reviewed changes

seunghwak reviewed May 18, 2021

View reviewed changes

Merge branch 'branch-21.06' into fea_test_graphs2

8eb1c2d

ChuckHastings mentioned this pull request May 28, 2021

[FEA] Improve test graph generation #1545

Closed

BradReesWork changed the title ~~Fea test graphs2~~ [REVIEW] adding test graphs - part 2 Jun 1, 2021

Merge branch 'branch-21.06' into fea_test_graphs2

a55f752

BradReesWork added the 4 - Waiting on Author label Jun 1, 2021

fix rmat usecase initializer

727b5d6

BradReesWork linked an issue Jun 2, 2021 that may be closed by this pull request

Data Generators #1544

Open

3 tasks

ChuckHastings removed a link to an issue Jun 2, 2021

Data Generators #1544

Open

3 tasks

ChuckHastings mentioned this pull request Jun 2, 2021

Data Generators #1544

Open

3 tasks

address review comments

84a76f3

Merge branch 'branch-21.06' into fea_test_graphs2

fba257e

ChuckHastings removed the 4 - Waiting on Author label Jun 2, 2021

fix clang-format issues

6e24eaf

aschaffer self-requested a review June 2, 2021 21:50

aschaffer approved these changes Jun 2, 2021

View reviewed changes

ChuckHastings removed the merge conflict label Jun 3, 2021

BradReesWork approved these changes Jun 3, 2021

View reviewed changes

rapids-bot bot merged commit 4e20f73 into rapidsai:branch-21.06 Jun 3, 2021

ChuckHastings deleted the fea_test_graphs2 branch July 29, 2021 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] adding test graphs - part 2 #1603

[REVIEW] adding test graphs - part 2 #1603

ChuckHastings commented May 13, 2021

codecov-commenter commented May 14, 2021 •

edited

Loading

aschaffer May 14, 2021

ChuckHastings Jun 2, 2021

aschaffer May 14, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

seunghwak May 18, 2021

ChuckHastings Jun 2, 2021

BradReesWork commented Jun 3, 2021


		rmm::device_uvector<size_t> indices_v(count, handle.get_stream());

		handle.get_stream_view().synchronize();

[REVIEW] adding test graphs - part 2 #1603

[REVIEW] adding test graphs - part 2 #1603

Conversation

ChuckHastings commented May 13, 2021

codecov-commenter commented May 14, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradReesWork commented Jun 3, 2021

codecov-commenter commented May 14, 2021 •

edited

Loading