Skip to content

Commit

Permalink
Change the renumber_sampled_edgelist function behavior. (#3762)
Browse files Browse the repository at this point in the history
There was a misalignment between the `renumber_sampled_edgelist` function behavior and what PyG and DGL need.

This PR fixes this.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #3762
  • Loading branch information
seunghwak authored Aug 3, 2023
1 parent f4627f8 commit f6543f6
Show file tree
Hide file tree
Showing 5 changed files with 521 additions and 435 deletions.
25 changes: 13 additions & 12 deletions cpp/include/cugraph/graph_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -922,15 +922,16 @@ rmm::device_uvector<vertex_t> select_random_vertices(
* This function renumbers sampling function (e.g. uniform_neighbor_sample) outputs satisfying the
* following requirements.
*
* 1. Say @p edgelist_srcs has N unique vertices. These N unique vertices will be mapped to [0, N).
* 2. Among the N unique vertices, an original vertex with a smaller attached hop number will be
* renumbered to a smaller vertex ID than any other original vertices with a larger attached hop
* number (if @p edgelist_hops.has_value() is true). If a single vertex is attached to multiple hop
* numbers, the minimum hop number is used.
* 3. Say @p edgelist_dsts has M unique vertices that appear only in @p edgelist_dsts (the set of M
* unique vertices does not include any vertices that appear in @p edgelist_srcs). Then, these M
* unique vertices will be mapped to [N, N + M).
* 4. If label_offsets.has_value() is ture, edge lists for different labels will be renumbered
* 1. If @p edgelist_hops is valid, we can consider (vertex ID, flag=src, hop) triplets for each
* vertex ID in @p edgelist_srcs and (vertex ID, flag=dst, hop) triplets for each vertex ID in @p
* edgelist_dsts. From these triplets, we can find the minimum (hop, flag) pairs for every unique
* vertex ID (hop is the primary key and flag is the secondary key, flag=src is considered smaller
* than flag=dst if hop numbers are same). Vertex IDs with smaller (hop, flag) pairs precede vertex
* IDs with larger (hop, flag) pairs in renumbering. Ordering can be arbitrary among the vertices
* with the same (hop, flag) pairs.
* 2. If @p edgelist_hops is invalid, unique vertex IDs in @p edgelist_srcs precede vertex IDs that
* appear only in @p edgelist_dsts.
* 3. If label_offsets.has_value() is ture, edge lists for different labels will be renumbered
* separately.
*
* This function is single-GPU only (we are not aware of any practical multi-GPU use cases).
Expand All @@ -940,10 +941,10 @@ rmm::device_uvector<vertex_t> select_random_vertices(
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param edgelist_srcs A vector storing original edgelist source vertices.
* @param edgelist_hops An optional pointer to the array storing hops for each edge list source
* vertices (size = @p edgelist_srcs.size()).
* @param edgelist_dsts A vector storing original edgelist destination vertices (size = @p
* edgelist_srcs.size()).
* @param edgelist_hops An optional pointer to the array storing hops for each edge list (source,
* destination) pairs (size = @p edgelist_srcs.size() if valid).
* @param label_offsets An optional tuple of unique labels and the input edge list (@p
* edgelist_srcs, @p edgelist_hops, and @p edgelist_dsts) offsets for the labels (siez = # unique
* labels + 1).
Expand All @@ -962,8 +963,8 @@ std::tuple<rmm::device_uvector<vertex_t>,
renumber_sampled_edgelist(
raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_srcs,
std::optional<raft::device_span<int32_t const>> edgelist_hops,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<raft::device_span<int32_t const>> edgelist_hops,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<size_t const>>>
label_offsets,
bool do_expensive_check = false);
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/c_api/uniform_neighbor_sampling.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -236,9 +236,9 @@ struct uniform_neighbor_sampling_functor : public cugraph::c_api::abstract_funct
std::tie(src, dst, renumber_map, renumber_map_offsets) = cugraph::renumber_sampled_edgelist(
handle_,
std::move(src),
std::move(dst),
hop ? std::make_optional(raft::device_span<int32_t const>{hop->data(), hop->size()})
: std::nullopt,
std::move(dst),
std::make_optional(std::make_tuple(
raft::device_span<label_t const>{edge_label->data(), edge_label->size()},
raft::device_span<size_t const>{offsets->data(), offsets->size()})),
Expand Down
Loading

0 comments on commit f6543f6

Please sign in to comment.