Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename primitive functions. #2234

Merged
merged 23 commits into from
Apr 28, 2022
Merged

Conversation

seunghwak
Copy link
Contributor

@seunghwak seunghwak commented Apr 19, 2022

Partially address #2003
Rename

  • copy_v_transform_reduce_in|out_nbr => transform_reduce_incoming|outgoing_e_of_v
  • copy_v_transform_reduce_key_aggregated_out_nbr => transform_reduce_key_aggregated_outgoing_e_of_v
  • update_frontier_v_push_if_out_nbr => update_frontier_from_outgoing_e_of_v

@seunghwak seunghwak added 2 - In Progress improvement Improvement / enhancement to an existing function breaking Breaking change labels Apr 19, 2022
@seunghwak seunghwak added this to the 22.06 milestone Apr 19, 2022
@seunghwak seunghwak requested a review from a team as a code owner April 19, 2022 17:42
@seunghwak seunghwak self-assigned this Apr 19, 2022
@seunghwak seunghwak changed the title [WIP][skip-ci] Rename primitive functions. [WIP] Rename primitive functions. Apr 19, 2022
@seunghwak seunghwak requested a review from a team as a code owner April 19, 2022 21:41
@seunghwak seunghwak changed the title [WIP] Rename primitive functions. Rename primitive functions. Apr 19, 2022
@ChuckHastings
Copy link
Collaborator

rerun tests

* vertex_partition_range_offsets[row_comm_size * i + row_comm_rank] and b_i =
* vertex_partition_range_offsets[row_comm_size * i + row_comm_rank + 1]. c is
* vertex_partition_range_offsets[row_comm_size * col_comm_rank] and d =
* vertex_partition_range_offsets[row_comm_size * (col_comm_rank + 1)].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this to specify that it is a range of vertex ids?
vertex_id_ranges perhaps?

Copy link
Contributor Author

@seunghwak seunghwak Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more consistent with the rest, we may need to rename vertex_partition_range to vertex_id_partition_range but I feel like this is a bit too lengthy (I thought vertex_partition somewhat implies partitioning of vertex ID ranges but it seems like this implication is not clear enough).

I added additional comments below to make this point clearer. Do you think this is sufficient or we need to further discuss about renaming?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this explanation is clear enough.

EdgeOp e_op,
T init,
VertexValueOutputIterator vertex_value_output_first)
void transform_reduce_e_of_v(raft::handle_t const& handle,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just use transform_reduce_edges here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -985,7 +982,7 @@ template <typename GraphViewType,
typename EdgeOp,
typename T,
typename VertexValueOutputIterator>
void copy_v_transform_reduce_out_nbr(
void transform_reduce_outgoing_e_of_v(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform_reduce_outgoing_edges?
The of_v part seems unnecessary to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is for per vertex reduction (previously copy_v_transform_reduce_in|out_nbr).

I thought this two level naming scheme is confusing,

So, for global reduction, we use _e or _v

and per vertex reduction, we use incoming/outgoing_e_of_v

Besides this clarification, do you have other suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would read transform_reduce_outgoing_e_of_v as 'transform reduce outgoing edges of vertices'.
The 'of vertices' doesn't really communicate that a per source reduction is being done. We should think of something that does make this clear.

We could name it exactly as transform_reduce_e to denote that every edge is being iterated on. The fact that this variant is doing per source reduction could be explained by differing function signature. This could be confusing to some users.

We could also use transform_reduce_outgoing_edges_per_source or '..._e_per_src'.

Copy link
Collaborator

@kaatish kaatish Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point was that if we are talking in terms of edges, in my opinion we should use the term source/destination (src/dst) instead of vertex.
Perhaps I am overlooking some nuances when dealing with transposed (csc) graphs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I like the per_src idea. I will make this change.

Copy link
Collaborator

@kaatish kaatish Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will using src nomenclature still make sense even when the graph is transposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume so.

But transform_reduce_outgoing_e_of_v and transform_reduce_e_per_src may have slightly different mental models.

I initially hoped outgoing_e_of_v to be interpreted as "for every vertex, iterate over outgoing edges and reduce the edge functor outputs per vertex"; so this is more vertex centric.

"e_per_src" might be interpreted as iterate over every directed edge, apply a functor to each edge, than reduce functor outputs per source vertex ID; so this is more edge centric.

Both are pretty much doing the same thing, but nuances might be slightly different, and I feel like the latter is a bit clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether the graph is transposed or not affects efficiency of the primitive implementation. Per source reduction can be more efficient with a source-major format while per destination reduction can be more efficient with a destination-major format (by avoiding atomics, and some primitives allow only a source-major format... this is more to avoid implementing something that will not be used in the foreseeable future).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern about per_src nomenclature is about 0-(in|out)-degree vertices.

So these primitives are designed to serve properties_of_v = some_initial_value + sum_over_incoming|outgoing_e (calculated per edge value) (e.g. PageRank, Katz centrality, HITS, summing per vertex incoming/outgoing edge weights).

This doesn't align very well with the edge-centric mental model.

@@ -204,7 +202,7 @@ template <typename GraphViewType,
typename ReduceOp,
typename T,
typename VertexValueOutputIterator>
void copy_v_transform_reduce_key_aggregated_out_nbr(
void transform_reduce_key_aggregated_outgoing_e_of_v(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform_reduce_key_aggregated_outgoing_edges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -105,7 +105,7 @@ std::tuple<result_t, size_t> hits(raft::handle_t const& handle,
}
for (size_t iter = 0; iter < max_iterations; ++iter) {
// Update current destination authorities property
copy_v_transform_reduce_in_nbr(
transform_reduce_incoming_e_of_v(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform_reduce_incoming_edges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -222,7 +222,7 @@ void bfs(raft::handle_t const& handle,
e_op.prev_visited_flags = prev_visited_flags.data();
}

update_frontier_v_push_if_out_nbr(
update_frontier_from_outgoing_e_of_v(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are using the sources/destinations nomenclature then this could be renamed to be
update_frontier_from_destinations because the frontier will only contain vertices that are destinations as opposed to sources.
Having said that update_frontier_from_outgoing_edges is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about update_frontier_from_outgoing_e to be more consistent with the rest? (we are using e to denote edges so I hope this abbreviation is not too confusing).

I initially used e_of_v instead of just e borrowing this convention from other primitives performing per vertex reduction, but here, we are not performing per vertex reduction. Starting from an input frontier, iterate over all the outgoing edges from the vertices in the frontier, then, compute a new frontier by filtering out destination vertices of the outgoing edges.

Something like compute_new_frontier_from_outgoing_e_of_v_in_input_frontier might be more descriptive (but too verbose) and I assume update_frontier part reasonably covers both compute_new_frontier and of_v_in_input_frontier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about update_frontier_from_outgoing_e to be more consistent with the rest?

Yes, this works

@seunghwak seunghwak added the DO NOT MERGE Hold off on merging; see PR for details label Apr 26, 2022
@seunghwak
Copy link
Contributor Author

@ChuckHastings @kaatish Added "do not merge" label.

Still needs to figure out how to properly re-name copy_v_transform_reduce_in|out_nbr (or other two level primitives).

transform_reduce_incoming|outgoing_e_of_v may not be interpreted as per vertex reduction.
transform_reduce_e_per_src|dst may have an issue with 0 in|out-degree vertices.

I'm now thinking more about for_each_v_transform_reduce_incoming|outgoing_e... but still thinking about better names.

@seunghwak seunghwak removed the DO NOT MERGE Hold off on merging; see PR for details label Apr 27, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2022

Codecov Report

Merging #2234 (ae1de41) into branch-22.06 (38be932) will increase coverage by 0.48%.
The diff coverage is n/a.

❗ Current head ae1de41 differs from pull request most recent head 161f634. Consider uploading reports for the commit 161f634 to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06    #2234      +/-   ##
================================================
+ Coverage         70.82%   71.30%   +0.48%     
================================================
  Files               170      171       +1     
  Lines             11036    11169     +133     
================================================
+ Hits               7816     7964     +148     
+ Misses             3220     3205      -15     
Impacted Files Coverage Δ
python/cugraph/cugraph/centrality/__init__.py 100.00% <ø> (ø)
...thon/cugraph/cugraph/centrality/katz_centrality.py 88.23% <ø> (-1.24%) ⬇️
python/cugraph/cugraph/dask/common/part_utils.py 19.23% <ø> (-0.94%) ⬇️
python/cugraph/cugraph/gnn/graph_store.py 82.60% <ø> (+11.77%) ⬆️
python/cugraph/cugraph/structure/hypergraph.py 90.00% <ø> (ø)
python/cugraph/cugraph/tests/dask/test_mg_bfs.py 0.00% <ø> (ø)
python/cugraph/cugraph/tests/dask/test_mg_comms.py 0.00% <ø> (ø)
...cugraph/cugraph/tests/dask/test_mg_connectivity.py 0.00% <ø> (ø)
...ython/cugraph/cugraph/tests/dask/test_mg_degree.py 0.00% <ø> (ø)
python/cugraph/cugraph/tests/dask/test_mg_hits.py 0.00% <ø> (ø)
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2956d56...161f634. Read the comment docs.

@ChuckHastings
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit c8caf7c into rapidsai:branch-22.06 Apr 28, 2022
@seunghwak seunghwak deleted the enh_prim_names branch August 11, 2022 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants