Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nx-cugraph: add triangles and clustering algorithms #4093

Merged
merged 7 commits into from
Jan 19, 2024

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Jan 15, 2024

NetworkX tests are somewhat underspecified regarding how to handle self-loops for these algorithms. Also, I'm not sure if transitivity is supposed to work on directed graphs.

Once #4071 is merged, it should be easy to add is_bipartite function (and maybe others?).

@eriknw eriknw requested a review from a team as a code owner January 15, 2024 18:00
@eriknw eriknw added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 15, 2024
@eriknw
Copy link
Contributor Author

eriknw commented Jan 17, 2024

Note that transitivity and is_bipartite could use e.g. total_triangles if PLC had a faster way to implement it. is_bipartite only needs to know whether there are any triangles, which in GraphBLAS could be done as any_pair(L @ U.T).new(mask=L.S).reduce_scalar(any)

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just a couple of questions/feedback.

]


@networkx_algorithm(plc="triangle_count", version_added="24.02")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subjective feedback: I remember we discussed this but FWIW this is still a bit surprising. I have to take a beat and remember that these params are metadata and not important to functionality when reading the code. I suspect this could result in another issue like this someday since it's not self-documenting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it help to rename plc to _plc to signal that this parameter doesn't matter if you don't know what it's for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to use version_added to update the docstring, but I haven't fully baked that yet.

We could also make plc a code comment, which is a less useful, but also potentially less confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, a leading underscore could help emphasize these are "internal" params. I was mainly bringing it up for awareness, and this can (and probably should, unless you'd actually prefer to address it here) be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, so we have a reason to merge all open nx-cugraph PRs ;)

return 0
degree = int((G.src_indices == nodes).sum())
return 2 * numer / (degree * (degree - 1))
# What about self-edges?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves me curious; what happens with self-edges? I'm guessing the degree value for a node with a self-edge might be artificially high, but I hate to guess. Is this a limitation we need to address (perhaps as a FIXME) and/or something we need to include in extra_docstrings? Maybe at a minimum can you update the comment to elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkX is underspecified. There are no tests with self-edges. Perhaps we could add a test to match NetworkX, or we could try to upstream tests to NetworkX. The latter will require the NetworkX team being intentional about what is supposed to happen in this case. Should we ignore self-edges or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknw and I discussed offline and both think self-edges probably should not be included.

However, there's a discussion about this started here which could change this (probably in a future PR if not answered soon).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to ignore selfloops and comparison tests added.

@@ -26,6 +26,6 @@ def is_bipartite(G):
G = _to_graph(G)
# Counting triangles may not be the fastest way to do this, but it is simple.
node_ids, triangles, is_single_node = _triangles(
G, None, symmetrize="union" if G.is_directed else None
G, None, symmetrize="union" if G.is_directed() else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this discovered by the new tests? :)

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the tests.

@rlratzel
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 52ab54f into rapidsai:branch-24.02 Jan 19, 2024
106 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants