Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MG weighted similarity test failure #4054

Merged
merged 13 commits into from
Jan 10, 2024

Conversation

seunghwak
Copy link
Contributor

MG weighted similarity tests assume symmetric graphs (undirected). When we remove multi-edges, we pick arbitrary edges and this can lead to an asymmetry in edge weights. This PR adds an additional flag to keep minimum value edges in remove_multi_edges and use this function if the input graph is symmetric to maintain weight symmetry.

Applying the non-breaking label as this PR does not change existing code behavior if keep_min_value_edge is not provided.

@seunghwak seunghwak added bug Something isn't working non-breaking Non-breaking change labels Dec 8, 2023
@seunghwak seunghwak added this to the 24.02 milestone Dec 8, 2023
@seunghwak seunghwak self-assigned this Dec 8, 2023
@seunghwak seunghwak requested a review from a team as a code owner December 8, 2023 21:19
@@ -1024,6 +1029,11 @@ remove_self_loops(raft::handle_t const& handle,
* @param edgelist_weights Optional list of edge weights
* @param edgelist_edge_ids Optional list of edge ids
* @param edgelist_edge_types Optional list of edge types
* @param keep_min_value_edge Flag indicating whether to keep an arbitrary edge (false) or the
* minimum value edge (true) among the edges in a multi-edge. Relevant only if @p
* edgelist_wegihts.has_value() | @p edgelist_edge_ids.has_value() | @p
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wegihts => weights

* true, the edge with the minimum value will be selected. The edge weights will be first compared
* (if @p edgelist_weights.has_value() is true); edge IDs will be compared next (if @p
* edgelist_edge_ids.has_value() is true); and edge types (if @p edgelist_edge_types.has_value() is
* true) will compared last.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about this when I put this function in. The decision to select an arbitrary edge was both simpler, and didn't assume what criteria would be desired. But it does limit the usefulness of the feature.

It seems like there are more reasonable choices than just the minimum edge. Here some thoughts:

  1. Arbitrary (original code, or this code with the flag set to false)
  2. Minimum edge weight (this code with the flag set to true)
  3. Maximum edge weight
  4. Sum of edge weights
  5. Average edge weight

We also support other edge properties (currently we use edge type and edge id), but the data structures and primitives support arbitrary properties.

Would this better be handled by passing in some sort of struct that creates the sorting and reduction criteria? Making it an optional would allow the arbitrary behavior if std::nullopt or use the struct if we want some sort of specific criteria.

Thinking off the top of my head (leaving out lots of details), perhaps something like:

struct edge_reduction_t {
   ...
   template ...
   auto key_first() { // function that returns iterator to the first sort key }
   auto value_first() { // function that returns iterator to the first value key }
   auto reduce_function(auto key, auto value) { // returns the reduced edge }
};

Doesn't quite cover all of the cases (averaging would require both summing values and counting values so we could divide at the end... so perhaps there's an optional transform at the end.

I'd be fine marking this with a FIXME and letting the smallest value fix our immediate problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I was also thinking about edge ID & type... If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)

We can discuss what options are really necessary in cuGraph's context. NetworkX supports "arbitrary".

"If both edges exist in digraph and their edge data is different, only one edge is created with an arbitrary choice of which edge data to use."
https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.to_undirected.html

And I wasn't sure how much should we go beyond this without clear use cases, but now we have at least one use case (maintaining weight symmetry). We should clearly make updates if we can identify more use cases (e.g. anything related to edge ID & types).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track this idea. Definitely need to see what kind of use cases would drive this feature before we spend to much time on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)

Sorry to interject on this late but the python API has a tangential issue. In fact, the python API does not symmetrize edges that have edge_ids under the assumption that each edge has a unique edge id and will throw an exception if the user attempts to create such undirected graph. If the reverse edge has the same ID, wouldn't that violate uniqueness of the edge_ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this really depends on the application context. If a user considers an undirected edge as a single edge, representing this single edge as two edges with the opposite direction is just an internal implementation issue and those two edges may have the same edge ID. If a user considers a directed graph which happens to be symmetric, then, the two edges may better have two different edge IDs. A similar issue can happen with edge types.

I assume this needs more in-depth discussion.

@@ -1038,6 +1048,7 @@ remove_multi_edges(raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& edgelist_dsts,
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights,
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids,
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types);
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types,
bool keep_min_value_edge = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this flag also be exposed to the CAPI? In fact the new cugraph_graph_create_sg takes drop_multi_edges however the users on the higher stack (C, PLC, python) won't be able to control this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, yes, if the graph is symmetric, we need to set keep_min_value_edge to true to maintain symmetry. See a68868f for the update.

@ChuckHastings
Copy link
Collaborator

/merge

@rapids-bot rapids-bot bot merged commit 5e8e9b5 into rapidsai:branch-24.02 Jan 10, 2024
75 checks passed
@seunghwak seunghwak deleted the bug_mg_weighted_similarity branch May 22, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuGraph non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants