Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] adding test graphs - part 2 #1603

Merged
merged 11 commits into from
Jun 3, 2021

Conversation

ChuckHastings
Copy link
Collaborator

This effort was originally targeted toward the WCC effort, but has been expanded a bit. This supersedes #1545 which I will close.

The goal here is to create a means for constructing test graphs in an easier fashion. Testing the capabilities of different graph algorithms might require a variety of graphs. The objective of this PR is to better organize the graph generation components and to introduce some components to help in composing graphs out of multiple components.

This PR introduces the following capabilities:

  • Create an ER graph
  • Create a collection of Complete Graphs
  • Create a collection of 2D mesh graphs
  • Create a collection of 3D mesh graphs
  • Create a random path graph (connect all vertices with a single randomly ordered path)
  • Translate vertex ids of a graph
  • Combine multiple edge lists into a single graph

Closes #1543

@ChuckHastings ChuckHastings requested review from a team as code owners May 13, 2021 23:20
@ChuckHastings ChuckHastings added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 13, 2021
@ChuckHastings ChuckHastings requested a review from a team as a code owner May 14, 2021 01:25
@codecov-commenter
Copy link

codecov-commenter commented May 14, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@575677f). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #1603   +/-   ##
===============================================
  Coverage                ?   60.03%           
===============================================
  Files                   ?       80           
  Lines                   ?     3551           
  Branches                ?        0           
===============================================
  Hits                    ?     2132           
  Misses                  ?     1419           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 575677f...6e24eaf. Read the comment docs.

std::vector<rmm::device_uvector<vertex_t>> srcs_v{};
std::vector<rmm::device_uvector<vertex_t>> dsts_v{};

srcs_v.push_back(std::move(d_src_v));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using emplace_back() instead of push_back(), as it's more efficient for rvalue references (which is the case here, because of std::move()).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBE with next push.

raft::copy(
copy_weights_v.begin(), d_weights_v.begin(), d_weights_v.size(), handle.get_stream());

weights_v.push_back(std::move(d_weights_v));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using emplace_back() instead of push_back(), as it's more efficient for rvalue references (which is the case here, because of std::move()).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBE with next push.


rmm::device_uvector<size_t> indices_v(count, handle.get_stream());

handle.get_stream_view().synchronize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Residual from some earlier debugging. Deleted.


size_t count = thrust::count_if(rmm::exec_policy(handle.get_stream()),
random_iterator,
random_iterator + num_vertices * num_vertices,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to note that the Gnp model can be way more expensive than the Gnm model if num_vertices is large (but much simpler to implement as we don't need to worry about duplicates).

Shouldn't we add gnp in the function name so users can expect this will go over num_vertices * num_vertices potential edges?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed function name. Add a gnm function as well, although not implemented in this PR.

thrust::make_zip_iterator(thrust::make_tuple(src_v.begin(), src_v.end())),
[num_vertices] __device__(size_t index) {
size_t src = index / num_vertices;
size_t dst = index % num_vertices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break if vertex_t is 64 bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it will break if num_vertices > (2^31 - 1). If vertex_t is 64 bits but could be stored in 32 bits it would still work.

Since we don't want this variation called if num_vertices > (2^31 - 1)... even if it worked we don't want to do that much computation... I just added a CUGRAPH_EXPECTS at the beginning of the function.

rmm::device_uvector<vertex_t> &d_src_v,
rmm::device_uvector<vertex_t> &d_dst_v,
vertex_t vertex_id_offset,
uint64_t seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked that this function works if num_vertices is not a power of two?

IIRC, this function works if vertices are in the range [0, 2^scale), but I am not sure this still works otherwise. Need to check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will still work (but the scrambled vertex IDs are no longer contiguous integers but does not matter if we renumber or allow having many isolated vertices... but needs to double check).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a unit test for the scramble function. The result is correct (it validates properly). Your assertion is correct regarding the resulting ids no longer being contiguous. This will (as you observe) result in a collection of isolated vertices if we do not renumber; although if we renumber that will be corrected.

template <typename T>
void append_all(raft::handle_t const &handle,
std::vector<rmm::device_uvector<T>> &&input,
rmm::device_uvector<T> &output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we better return rmm::device_uvector instead of taking rmm::device_uvector<T>& output. Unless the ouput is in-out or there is another strong reason to do so, returning is more functional and side-effects-free than taking an lvalue reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in next push


size_t number_of_edges{0};

if (optional_d_weights) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should we always remove all multi-edges or better make this as an option? Graph500 input edge lists can possibly have multi-edges (and self-loops) and removing those is a task in the graph construction step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a flag to control this behavior in next push.

rmm::device_uvector<vertex_t> &&d_src_v,
rmm::device_uvector<vertex_t> &&d_dst_v,
std::optional<rmm::device_uvector<weight_t>> &&optional_d_weights_v)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code below is simpler and more efficient (in memory footprint & the amount of data movement).

auto offset = d_src_v.size();
d_src_v.resize(offset * 2, handle.get_stream_view());
d_dst_v.resize(d_src_v.size(), handle.get_stream_view());
thrust::copy(rmm::exec_policy(handle.get_stream_view()), d_dst_v.begin(), d_dst_v.begin() + offset, d_src_v.begin() + offset);
thrust::copy(rmm::exec_policy(handle.get_stream_view()), d_src_v.begin(), d_src_v.begin() + offset, d_dst_v.begin() + offset);
if (optional_d_weights_v) {
  optional_d_weights_v.resize(d_src_v.size(), handle.get_stream_view());
  thrust::copy(rmm::exec_policy(handle.get_stream_view()), optional_d_weight_v.begin(), optinoal_d_weight_v.begin() + offset, optional_d_weight_v.begin() + offset);
}

return std::make_tuple(std::move(d_src_v), std::move(d_dst_v), optional_d_weights_v ? std::move(optional_d_weights_v) : std::nullopt);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this version.

thrust::make_counting_iterator<size_t>(0),
[base_vertex_id, num_vertices, invalid_vertex] __device__(size_t index) {
size_t graph_index = index / (num_vertices * num_vertices);
size_t local_index = index % (num_vertices * num_vertices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break if vertex_t is 64 bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check (like above)

@BradReesWork BradReesWork changed the title Fea test graphs2 [REVIEW] adding test graphs - part 2 Jun 1, 2021
@BradReesWork BradReesWork linked an issue Jun 2, 2021 that may be closed by this pull request
3 tasks
@ChuckHastings ChuckHastings removed a link to an issue Jun 2, 2021
3 tasks
@ChuckHastings ChuckHastings mentioned this pull request Jun 2, 2021
3 tasks
@aschaffer aschaffer self-requested a review June 2, 2021 21:50
@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4e20f73 into rapidsai:branch-21.06 Jun 3, 2021
@ChuckHastings ChuckHastings deleted the fea_test_graphs2 branch July 29, 2021 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support generation of test graphs as a library feature (C++)
5 participants