Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add many graph generators to nx-cugraph #3954

Merged
merged 23 commits into from
Oct 31, 2023

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Oct 24, 2023

Also, better handle dtypes for edge values passed to pylibcugraph, which only takes float32 and float64 atm.

I also defined index_dtype (currently int32) to globally control the dtype of indices.

Also, better handle dtypes for edge values passed to pylibcugraph,
which only takes float32 and float64 atm.
Currently, node values aren't used for any values, the only thing they
are used for is converting to and from networkx, which we do just fine.
@eriknw
Copy link
Contributor Author

eriknw commented Oct 25, 2023

I just updated this PR to allow node values to be numpy arrays (not just cupy arrays) so we can handle str and object dtypes. The only thing we do with node values is convert to/from networkx, which we do just fine.

@eriknw eriknw marked this pull request as draft October 25, 2023 19:23
@eriknw
Copy link
Contributor Author

eriknw commented Oct 25, 2023

Switched to Draft, b/c some work is needed to pass networkx tests. This is 95% finished and can be probably be (mostly) reviewed.

@eriknw eriknw marked this pull request as ready for review October 26, 2023 02:39
@eriknw eriknw added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 27, 2023
@eriknw
Copy link
Contributor Author

eriknw commented Oct 27, 2023

This is now passing CI :)

@eriknw eriknw changed the title Add a few (mostly "classic") graph generators to nx-cugraph Add many graph generators to nx-cugraph Oct 27, 2023
Copy link
Contributor Author

@eriknw eriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -23 to +27
# from . import convert_matrix
# from .convert_matrix import *
from . import convert_matrix
from .convert_matrix import *

# from . import generators
# from .generators import *
from . import generators
from .generators import *
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Told you this would be coming soon ;)
#3848 (comment)

Comment on lines -59 to +64
node_values: dict[AttrKey, cp.ndarray[NodeValue]]
node_masks: dict[AttrKey, cp.ndarray[bool]]
node_values: dict[AttrKey, any_ndarray[NodeValue]]
node_masks: dict[AttrKey, any_ndarray[bool]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how disruptive it would be to allow numpy arrays for edge values too (in a new PR). It may not be bad at all, and we can raise if it has an incompatible dtype.

It could certainly be handy, but supporting this more broadly raises many other questions--how do we let users control whether data is numpy or cupy (for example, when converting from networkx)?

Comment on lines +388 to +392
def add_nodes_from(self, nodes_for_adding: Iterable[NodeKey], **attr) -> None:
if self._N != 0:
raise NotImplementedError(
"add_nodes_from is not implemented for graph that already has nodes."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was added to be compatible with a networkx decorator, which is why it's only partially implemented.

Comment on lines 608 to 609
...
# Should we warn?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn if integer values are being (safely, if no operations are done) converted to float?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought would be "no", but I don't have a strong opinion. I'm guessing this is possibly a warning many users would see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can warn if we ever have any algorithms that could result in different results. Right now we don't, so let's not warn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a code comment based on my previous comment.

@@ -541,12 +625,60 @@ def _get_plc_graph(
do_expensive_check=False,
)

def _sort_edge_indices(self, primary="src"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came in handy twice when working on this PR:

  • to generate the small and classic graph datasets
  • when testing generators in networkx

It's optionally used by nxcg.to_networkx.

Comment on lines +51 to +54
# We need to renumber indices--np.searchsorted to the rescue!
kwargs["id_to_key"] = nodes.tolist()
src_indices = cp.array(np.searchsorted(nodes, src_array), index_dtype)
dst_indices = cp.array(np.searchsorted(nodes, dst_array), index_dtype)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.searchsorted is magical here to do renumbering for us (after doing np.unique)!

}
kwargs["edge_values"] = edge_values

if graph_class.is_multigraph() and edge_key is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the multigraph tests in networkx all seem to use string edge values, which we don't support, so this isn't well covered (yet).

Comment on lines 22 to 24
_IS_NX32_OR_LESS = nx.__version__[:3] <= "3.2" and (
len(nx.__version__) <= 3 or not nx.__version__[3].isdigit()
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't pretty... I wonder whether we should just depend on packaging (already a test dependency) to make this easier if we expect to do different things w.r.t. networkx versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards just keeping this in place over adding a dependency.

Comment on lines +27 to +29
def _ensure_int(n):
"""Ensure n is integral."""
return op.index(n)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this trick elsewhere in the code; I'll probably add a better error message here and use this more places, which captures intent better than op.index(n) (I always add a comment to capture intent).

Comment on lines +112 to +113
else:
graph_class = G.__class__
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure if it's a good idea for us to support instances of nxcg.Graph as create_using= with networkx semantics... but they are networkx semantics.

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing. Looks good, I just had a few comments/questions. I think the theme of my review is mainly related to testing.

python/nx-cugraph/lint.yaml Show resolved Hide resolved
python/nx-cugraph/nx_cugraph/classes/graph.py Show resolved Hide resolved
Comment on lines 608 to 609
...
# Should we warn?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought would be "no", but I don't have a strong opinion. I'm guessing this is possibly a warning many users would see?

edge_dtype = np.dtype(edge_dtype)
if edge_array.dtype != edge_dtype:
edge_array = edge_array.astype(edge_dtype)
# PLC doesn't handle int edge weights right now, so cast int to float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests for these new conditions too? It might be nice to see the size limits verified with test code that looks like what a user would try to use for creating a graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No direct tests yet, but iirc this might get exercised by networkx tests. Wouldn't it be nice to have coverage reports that could answer easily this question 😉 ?

In general, we are largely blazing a path ahead, implementing fast, and trying to write good code, and tests are slow to catch up. You're welcome to write some if you'd like ;) . I expect proactive testing will catch up more next year, since maybe we rely on networkx tests too heavily. For now, "best effort" + networkx tests + add regression tests for regressions is my strategy. Convenient coverage reports would be handy to reveal completely untested code.

python/nx-cugraph/nx_cugraph/convert.py Outdated Show resolved Hide resolved
Comment on lines 22 to 24
_IS_NX32_OR_LESS = nx.__version__[:3] <= "3.2" and (
len(nx.__version__) <= 3 or not nx.__version__[3].isdigit()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards just keeping this in place over adding a dependency.

create_using=None,
edge_key=None,
):
"""cudf.DataFrame inputs also supported."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this since I think this would be good coverage, especially for cudf and "pam"

@eriknw
Copy link
Contributor Author

eriknw commented Oct 31, 2023

Ooh, networkx recently added tadpole_graph, which looks easy enough to do: networkx/networkx#6999

@rlratzel
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit d6c7fa1 into rapidsai:branch-23.12 Oct 31, 2023
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants