Skip to content

Commit

Permalink
Improve graph primitives performance on graphs with widely varying ve…
Browse files Browse the repository at this point in the history
…rtex degrees (#1447)

Partially addresses Issue #1442

Update graph primitives used by PageRank, Katz Centrality, BFS, and SSSP to launch 3 different kernels based on vertex degrees to address thread divergence issue. In addition, cut memory footprint of the VertexFrontier class used by BFS & SSSP.

The following highlights performance improvement with this optimization.

R-mat 2^25 vertices 2^25 * 32 edges
PageRank: 7.66, 7.42, 8.83, 8.83 seconds (the first two unweighted, the last two weighted, first & third without personalization)=> 1.07, 1.08, 1.36, 1.39 seconds
Katz: 1.08, 1.94 seconds (unweighted, weighted)=> 0.243, 0.275
BFS: 1.32 seconds=> 0.251
R-mat 2^25 vertices 2^25 * 16 edges
SSSP: 1.89 seconds (memory allocation fails with the edge factor of 32)=> 0.317

And now SSSP also works with 2^25 vertices 2^25 * 32 edges with the memory footprint improvement and it took 0.514 sec.

Still needs additional optimizations to reach the target performance

1. add BFS & SSSP specific optimizations (the current implementation assumes general reduction operations while BFS can pick any source vertex if a vertex is discovered by multiple source vertices and SSSP picks the one with the minimum edge weight, these pure function reduction operations allow additional optimizations).
2. Launch 3 different kernels in multiple streams to recover parallelism when the frontier size is relatively small (currently three kernels are queued in a single stream, and this leads to up to 3x decrease in parallelism)

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Alex Fender (https://github.com/afender)
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Brad Rees (https://github.com/BradReesWork)

URL: #1447
  • Loading branch information
seunghwak authored Apr 7, 2021
1 parent b442f3b commit 1b34e26
Show file tree
Hide file tree
Showing 25 changed files with 1,276 additions and 845 deletions.
11 changes: 6 additions & 5 deletions cpp/include/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,12 @@ class graph_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enab
offsets,
indices,
weights,
vertex_partition_segment_offsets_,
adj_matrix_partition_segment_offsets_,
partition_,
this->get_number_of_vertices(),
this->get_number_of_edges(),
this->get_graph_properties(),
vertex_partition_segment_offsets_.size() > 0,
adj_matrix_partition_segment_offsets_.size() > 0,
false);
}

Expand All @@ -105,9 +105,10 @@ class graph_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enab
partition_t<vertex_t> partition_{};

std::vector<vertex_t>
vertex_partition_segment_offsets_{}; // segment offsets within the vertex partition based on
// vertex degree, relevant only if
// sorted_by_global_degree_within_vertex_partition is true
adj_matrix_partition_segment_offsets_{}; // segment offsets within the vertex partition based
// on vertex degree, relevant only if
// sorted_by_global_degree_within_vertex_partition is
// true
};

// single-GPU version
Expand Down
2 changes: 2 additions & 0 deletions cpp/include/experimental/graph_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,8 @@ void unrenumber_local_int_vertices(
vertex_t local_int_vertex_last,
bool do_expensive_check = false);

// FIXME: We may add unrenumber_int_rows(or cols) as this will require communication only within a
// sub-communicator and potentially be more efficient.
/**
* @brief Unrenumber (possibly non-local) internal vertices to external vertices based on the
* providied @p renumber_map_labels.
Expand Down
27 changes: 23 additions & 4 deletions cpp/include/experimental/graph_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ class graph_view_t<vertex_t,
std::vector<edge_t const*> const& adj_matrix_partition_offsets,
std::vector<vertex_t const*> const& adj_matrix_partition_indices,
std::vector<weight_t const*> const& adj_matrix_partition_weights,
std::vector<vertex_t> const& vertex_partition_segment_offsets,
std::vector<vertex_t> const& adj_matrix_partition_segment_offsets,
partition_t<vertex_t> const& partition,
vertex_t number_of_vertices,
edge_t number_of_edges,
Expand Down Expand Up @@ -431,6 +431,17 @@ class graph_view_t<vertex_t,
: vertex_t{0};
}

std::vector<vertex_t> get_local_adj_matrix_partition_segment_offsets(size_t partition_idx) const
{
return adj_matrix_partition_segment_offsets_.size() > 0
? std::vector<vertex_t>(
adj_matrix_partition_segment_offsets_.begin() +
partition_idx * (detail::num_segments_per_vertex_partition + 1),
adj_matrix_partition_segment_offsets_.begin() +
(partition_idx + 1) * (detail::num_segments_per_vertex_partition + 1))
: std::vector<vertex_t>{};
}

// FIXME: this function is not part of the public stable API. This function is mainly for pattern
// accelerator implementation. This function is currently public to support the legacy
// implementations directly accessing CSR/CSC data, but this function will eventually become
Expand Down Expand Up @@ -499,9 +510,10 @@ class graph_view_t<vertex_t,
partition_t<vertex_t> partition_{};

std::vector<vertex_t>
vertex_partition_segment_offsets_{}; // segment offsets within the vertex partition based on
// vertex degree, relevant only if
// sorted_by_global_degree_within_vertex_partition is true
adj_matrix_partition_segment_offsets_{}; // segment offsets within the vertex partition based
// on vertex degree, relevant only if
// sorted_by_global_degree_within_vertex_partition is
// true
};

// single-GPU version
Expand Down Expand Up @@ -612,6 +624,13 @@ class graph_view_t<vertex_t,
return vertex_t{0};
}

std::vector<vertex_t> get_local_adj_matrix_partition_segment_offsets(
size_t adj_matrix_partition_idx) const
{
assert(adj_matrix_partition_idx == 0);
return segment_offsets_.size() > 0 ? segment_offsets_ : std::vector<vertex_t>{};
}

// FIXME: this function is not part of the public stable API.This function is mainly for pattern
// accelerator implementation. This function is currently public to support the legacy
// implementations directly accessing CSR/CSC data, but this function will eventually become
Expand Down
Loading

0 comments on commit 1b34e26

Please sign in to comment.