Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add R-mat generator #1411

Merged
merged 13 commits into from
Mar 2, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,7 @@ add_library(cugraph SHARED
src/components/connectivity.cu
src/centrality/katz_centrality.cu
src/centrality/betweenness_centrality.cu
src/experimental/generate_rmat_edgelist.cu
src/experimental/graph.cu
src/experimental/graph_view.cu
src/experimental/coarsen_graph.cu
Expand Down
81 changes: 81 additions & 0 deletions cpp/include/experimental/graph_generator.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <raft/handle.hpp>
#include <rmm/device_uvector.hpp>

#include <cstdint>
#include <tuple>

namespace cugraph {
namespace experimental {

/**
* @brief generate an edge list for an R-mat graph.
afender marked this conversation as resolved.
Show resolved Hide resolved
*
* @p scramble_vertex_ids needs to be set to `true` to generate a graph conforming to the Graph 500
* specification (note that scrambling does not affect cuGraph's graph construction performance, so
* this is generally unnecessary). If `edge_factor` is given (e.g. Graph 500), set @p num_edges to
* (size_t{1} << @p scale) * `edge_factor`. To generate an undirected graph, set @p b == @p c and @p
* clip_and_flip = true. All the resulting edges will be placed in the lower triangular part
* (inculding the diagonal) of the graph adjacency matrix.
*
* For multi-GPU generation with `P` GPUs, @p seed should be set to different values in different
* GPUs to avoid every GPU generating the same set of edges. @p num_edges should be adjusted as
* well; e.g. assuming `edge_factor` is given, set @p num_edges = (size_t{1} << @p scale) *
* `edge_factor` / `P` + (rank < (((size_t{1} << @p scale) * `edge_factor`) % P) ? 1 : 0).
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param scale Scale factor to set the number of verties in the graph. Vertex IDs have values in
* [0, V), where V = 1 << @p scale.
* @param num_edges Number of edges to generate.
* @param a a, b, c, d (= 1.0 - (a + b + c)) in the R-mat graph generator (vist https://graph500.org
* for additional details). a, b, c, d should be non-negative and a + b + c should be no larger
* than 1.0.
* @param b a, b, c, d (= 1.0 - (a + b + c)) in the R-mat graph generator (vist https://graph500.org
* for additional details). a, b, c, d should be non-negative and a + b + c should be no larger
* than 1.0.
* @param c a, b, c, d (= 1.0 - (a + b + c)) in the R-mat graph generator (vist https://graph500.org
* for additional details). a, b, c, d should be non-negative and a + b + c should be no larger
* than 1.0.
* @param seed Seed value for the random number generator.
* @param clip_and_flip Flag controlling whether to generate edges only in the lower triangular part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand this is that when clip_and_flip is false this returns a directed graph. When it is true it can be seen as an undirected one but not the kind of format we use in cugraph.

I think we would benefit from exposing an option that generates the undirected graph inputs that we expect in cugraph.

Copy link
Contributor Author

@seunghwak seunghwak Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct (if clip-and-flip is set to true, all the edges are in the lower triangular part of the graph adjacency matrix, they need to be symmetrized for cuGraph use), and my plan is to symmetrize the output after this R-mat generator; so to keep the R-mat generator's behavior close to Graph 500. Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with doing symmetrize separately, I just recall that the symmetrize step was done at the python level with cudf (concat + group_by) before. Do we have that feature at C++ level now? If not, we would need to add it before we can use this for C++ testing for instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have the feature yet. I think symmetrize, transpose, and triangle counting are related problems and can be addressed together.

* (including the diagonal) of the graph adjacency matrix (if set to `true`) or not (if set to
* `false`).
* @param scramble_vertex_ids Flag controlling whether to scramble vertex ID bits (if set to `true`)
* or not (if set to `false`); scrambling vertx ID bits breaks correlation between vertex ID values
* and vertex degrees. The scramble code here follows the algorithm in the Graph 500 reference
* implementation version 3.0.0.
* @return std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> A tuple of
* rmm::device_uvector objects for edge source vertex IDs and edge destination vertex IDs.
*/
template <typename vertex_t>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> generate_rmat_edgelist(
raft::handle_t const& handle,
size_t scale,
size_t edge_factor = 16,
double a = 0.57,
double b = 0.19,
double c = 0.19,
uint64_t seed = 0,
bool clip_and_flip = false,
bool scramble_vertex_ids = false);

} // namespace experimental
} // namespace cugraph
149 changes: 149 additions & 0 deletions cpp/src/experimental/generate_rmat_edgelist.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <experimental/scramble.cuh>

#include <experimental/graph_generator.hpp>
#include <utilities/error.hpp>

#include <rmm/thrust_rmm_allocator.h>
#include <raft/handle.hpp>
#include <raft/random/rng.cuh>
#include <rmm/device_uvector.hpp>

#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>

#include <tuple>

namespace cugraph {
namespace experimental {

template <typename vertex_t>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> generate_rmat_edgelist(
raft::handle_t const& handle,
size_t scale,
size_t num_edges,
double a,
double b,
double c,
uint64_t seed,
bool clip_and_flip,
bool scramble_vertex_ids)
{
CUGRAPH_EXPECTS(size_t{1} << scale <= std::numeric_limits<vertex_t>::max(),
"Invalid input argument: scale to large for vertex_t.");
CUGRAPH_EXPECTS((a >= 0.0) && (b >= 0.0) && (c >= 0.0) && (a + b + c <= 1.0),
"Invalid input argument: a, b, c should be non-negative and a + b + c should not "
"be larger than 1.0.");

raft::random::Rng rng(seed + 10);
// to limit memory footprint (1024 is a tuning parameter)
auto max_edges_to_generate_per_iteration =
static_cast<size_t>(handle.get_device_properties().multiProcessorCount) * 1024;
rmm::device_uvector<float> rands(
std::min(num_edges, max_edges_to_generate_per_iteration) * 2 * scale, handle.get_stream());

rmm::device_uvector<vertex_t> srcs(num_edges, handle.get_stream());
rmm::device_uvector<vertex_t> dsts(num_edges, handle.get_stream());

size_t num_edges_generated{0};
while (num_edges_generated < num_edges) {
auto num_edges_to_generate =
std::min(num_edges - num_edges_generated, max_edges_to_generate_per_iteration);
auto pair_first = thrust::make_zip_iterator(thrust::make_tuple(srcs.begin(), dsts.begin())) +
num_edges_generated;
rng.uniform<float, size_t>(
rands.data(), num_edges_to_generate * 2 * scale, 0.0f, 1.0f, handle.get_stream());
thrust::transform(
rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
thrust::make_counting_iterator(size_t{0}),
thrust::make_counting_iterator(num_edges_to_generate),
pair_first,
// if a + b == 0.0, a_norm is irrelevant, if (1.0 - (a+b)) == 0.0, c_norm is irrelevant
[scale,
clip_and_flip,
rands = rands.data(),
a_plus_b = a + b,
a_norm = (a + b) > 0.0 ? a / (a + b) : 0.0,
c_norm = (1.0 - (a + b)) > 0.0 ? c / (1.0 - (a + b)) : 0.0] __device__(auto i) {
vertex_t src{0};
vertex_t dst{0};
for (size_t bit = scale - 1; bit != 0; --bit) {
auto r0 = rands[i * 2 * scale + 2 * bit];
auto r1 = rands[i * 2 * scale + 2 * bit + 1];
auto src_bit_set = r0 > a_plus_b;
auto dst_bit_set = r1 > (src_bit_set ? c_norm : a_norm);
if (clip_and_flip) {
if (src == dst) {
if (!src_bit_set && dst_bit_set) {
src_bit_set = !src_bit_set;
dst_bit_set = !dst_bit_set;
}
}
}
src += src_bit_set ? static_cast<vertex_t>(1 << bit) : 0;
dst += dst_bit_set ? static_cast<vertex_t>(1 << bit) : 0;
}
return thrust::make_tuple(src, dst);
});
num_edges_generated += num_edges_to_generate;
}

if (scramble_vertex_ids) {
rands.resize(0, handle.get_stream());
rands.shrink_to_fit(handle.get_stream());

auto pair_first = thrust::make_zip_iterator(thrust::make_tuple(srcs.begin(), dsts.begin()));
thrust::transform(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
pair_first,
pair_first + srcs.size(),
pair_first,
[scale] __device__(auto pair) {
return thrust::make_tuple(scramble(thrust::get<0>(pair), scale),
scramble(thrust::get<1>(pair), scale));
});
}

return std::make_tuple(std::move(srcs), std::move(dsts));
}

// explicit instantiation

template std::tuple<rmm::device_uvector<int32_t>, rmm::device_uvector<int32_t>>
generate_rmat_edgelist<int32_t>(raft::handle_t const& handle,
size_t scale,
size_t num_edges,
double a,
double b,
double c,
uint64_t seed,
bool clip_and_flip,
bool scramble_vertex_ids);

template std::tuple<rmm::device_uvector<int64_t>, rmm::device_uvector<int64_t>>
generate_rmat_edgelist<int64_t>(raft::handle_t const& handle,
size_t scale,
size_t num_edges,
double a,
double b,
double c,
uint64_t seed,
bool clip_and_flip,
bool scramble_vertex_ids);

} // namespace experimental
} // namespace cugraph
88 changes: 88 additions & 0 deletions cpp/src/experimental/scramble.cuh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/* Copyright (C) 2009-2010 The Trustees of Indiana University. */
/* */
/* Use, modification and distribution is subject to the Boost Software */
/* License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at */
/* http://www.boost.org/LICENSE_1_0.txt) */
/* */
/* Authors: Jeremiah Willcock */
/* Andrew Lumsdaine */

/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <cassert>
#include <cstdint>

/* Apply a permutation to scramble vertex numbers; a randomly generated
* permutation is not used because applying it at scale is too expensive. */
template <typename vertex_t>
__device__ std::enable_if_t<sizeof(vertex_t) == 8, vertex_t> scramble(vertex_t value, size_t lgN)
{
assert(std::is_unsigned<vertex_t>::value || lgN < 64);
assert(value >= 0);

constexpr uint64_t scramble_value0{606610977102444280}; // randomly generated
constexpr uint64_t scramble_value1{11680327234415193037}; // randomly generated

auto v = static_cast<uint64_t>(value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common part of these 3 specialization could be factored out in a template function; e.g.,

template<typename precision_t, typename brev_wrapper_t>
__device__ decltype(auto) generate_v(precision_t const& value, 
 precision_t const& scramble0,
 precision_t const& scramble1, 
 precision_t const& mask0,
 precision_t const& mask1,
 size_t lgN,
 brev_wrapper_t brev)
{
  auto v = static_cast<precision_t>(value);
  v += scramble 0 + scramble 1;
  v *= (scramble0 | mask0);
  v = brev(v) >> (8*sizeof(precision_t) - lgN);
  v *= (scramble1| mask1);
  v = brev(v) >> (8*sizeof(precision_t) - lgN);
 return v;
}

Then in each specialization initialize the scrambles, masks, and the appropriate brev lambda; and pass all to generate_v(...). Something like that. That way the underlying logic is factored out and only moving parts are kept separate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And brev can be constructed in-place as a __device__ generic lambda (if possible... I think there might be some issues with nvcc compiling generic device lambdas). But, if possible, something like:

auto brev = [] __device__ (auto v){
 return __brevll(v);
};
//...
auto v = generate_v(..., brev);

Obviously, different specializations of scramble(...) would instantiate different versions of brev, accordingly, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for pointing this out, and I refactored the code to remove redundancy.

v += scramble_value0 + scramble_value1;
v *= (scramble_value0 | uint64_t{0x4519840211493211});
v = __brevll(v) >> (64 - lgN);
v *= (scramble_value1 | uint64_t{0x3050852102C843A5});
v = __brevll(v) >> (64 - lgN);
return static_cast<vertex_t>(v);
}

/* Apply a permutation to scramble vertex numbers; a randomly generated
* permutation is not used because applying it at scale is too expensive. */
template <typename vertex_t>
__device__ std::enable_if_t<sizeof(vertex_t) == 4, vertex_t> scramble(vertex_t value, size_t lgN)
{
assert(std::is_unsigned<vertex_t>::value || lgN < 32);
assert(value >= 0);

constexpr uint32_t scramble_value0{282475248}; // randomly generated
constexpr uint32_t scramble_value1{2617694917}; // randomly generated

auto v = static_cast<uint32_t>(value);
v += scramble_value0 + scramble_value1;
v *= (scramble_value0 | uint32_t{0x11493211});
v = __brev(v) >> (32 - lgN);
v *= (scramble_value1 | uint32_t{0x02C843A5});
v = __brev(v) >> (32 - lgN);
return static_cast<vertex_t>(v);
}

/* Apply a permutation to scramble vertex numbers; a randomly generated
* permutation is not used because applying it at scale is too expensive. */
template <typename vertex_t>
__device__ std::enable_if_t<sizeof(vertex_t) == 2, vertex_t> scramble(vertex_t value, size_t lgN)
{
assert(std::is_unsigned<vertex_t>::value || lgN < 16);
assert(value >= 0);

constexpr uint32_t scramble_value0{0}; // randomly generated
constexpr uint32_t scramble_value1{8620}; // randomly generated

auto v = static_cast<uint16_t>(value);
v += scramble_value0 + scramble_value1;
v *= (scramble_value0 | uint16_t{0x3211});
v = static_cast<uint16_t>(__brev(v) >> 16) >> (16 - lgN);
v *= (scramble_value1 | uint16_t{0x43A5});
v = static_cast<uint16_t>(__brev(v) >> 16) >> (16 - lgN);
return static_cast<vertex_t>(v);
}
8 changes: 8 additions & 0 deletions cpp/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,14 @@ set(MST_TEST_SRC
ConfigureTest(MST_TEST "${MST_TEST_SRC}")


###################################################################################################
# - Experimental R-mat graph generation tests -----------------------------------------------------

set(EXPERIMENTAL_GENERATE_RMAT_TEST_SRCS
"${CMAKE_CURRENT_SOURCE_DIR}/experimental/generate_rmat_test.cpp")

ConfigureTest(EXPERIMENTAL_GENERATE_RMAT_TEST "${EXPERIMENTAL_GENERATE_RMAT_TEST_SRCS}" "")

###################################################################################################
# - Experimental Graph tests ----------------------------------------------------------------------

Expand Down
18 changes: 2 additions & 16 deletions cpp/tests/experimental/coarsen_graph_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,20 +36,6 @@
#include <type_traits>
#include <vector>

template <typename vertex_t>
std::enable_if_t<std::is_signed<vertex_t>::value, bool> is_valid_vertex(vertex_t num_vertices,
vertex_t v)
{
return (v >= 0) && (v < num_vertices);
}

template <typename vertex_t>
std::enable_if_t<std::is_unsigned<vertex_t>::value, bool> is_valid_vertex(vertex_t num_vertices,
vertex_t v)
{
return v < num_vertices;
}

template <typename vertex_t, typename edge_t, typename weight_t>
void check_coarsened_graph_results(edge_t* org_offsets,
vertex_t* org_indices,
Expand All @@ -68,13 +54,13 @@ void check_coarsened_graph_results(edge_t* org_offsets,
ASSERT_TRUE(std::count_if(org_indices,
org_indices + org_offsets[num_org_vertices],
[num_org_vertices](auto nbr) {
return !is_valid_vertex(num_org_vertices, nbr);
return !cugraph::test::is_valid_vertex(num_org_vertices, nbr);
}) == 0);
ASSERT_TRUE(std::is_sorted(coarse_offsets, coarse_offsets + num_coarse_vertices));
ASSERT_TRUE(std::count_if(coarse_indices,
coarse_indices + coarse_offsets[num_coarse_vertices],
[num_coarse_vertices](auto nbr) {
return !is_valid_vertex(num_coarse_vertices, nbr);
return !cugraph::test::is_valid_vertex(num_coarse_vertices, nbr);
}) == 0);
ASSERT_TRUE(num_coarse_vertices <= num_org_vertices);

Expand Down
Loading