Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]Add remote storage support #2859

Merged

Conversation

VibhuJawa
Copy link
Member

@VibhuJawa VibhuJawa commented Oct 28, 2022

Add remote CuGraphRemoteStore support to enable using of graph service with DGL .

  • Transfer via UCX to device
  • Add node_parquet_data
  • Add edge_parquet_data
  • node_storage
  • edge_storage
  • node attributes
  • edge attributes
  • extract_subgraph
  • uniform sampling

Tests:

  • How to run tests in CI !!!
  • add_node_parquet_data
  • add_edge_parquet_data
  • node_storage
  • edge_storage
  • node attributes
  • edge attributes
  • extract_subgraph
  • uniform sampling

This PR depends upon:
#2855
#2850

rlratzel and others added 30 commits October 25, 2022 23:39
…to graph creation extensions, changed how unloading extensions work to unload by unique module name, make numpy and cupy optional modules in the client, updated Value and ValueWrapper to handle doubles and lists (recursive), added tests for new extension changes.
…one, added test for extension modules returning results to client GPUs (code not done yet).
…(), updated tests, added test for upcoming ability to load extensions via import paths.
…kage path (eg. import foo.bar.baz), updated test.
…ns, added tests for error conditions, added benchmark for call_extension test.
@VibhuJawa VibhuJawa changed the title [WIP]Add remote storage support [REVIEW]Add remote storage support Nov 1, 2022
@VibhuJawa VibhuJawa marked this pull request as ready for review November 1, 2022 15:37
@VibhuJawa
Copy link
Member Author

rerun tests

@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2022

Codecov Report

Base: 62.76% // Head: 60.75% // Decreases project coverage by -2.00% ⚠️

Coverage data is based on head (5235e46) compared to base (5c15341).
Patch coverage: 35.53% of modified lines in pull request are covered.

❗ Current head 5235e46 differs from pull request most recent head bf49417. Consider uploading reports for the commit bf49417 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12    #2859      +/-   ##
================================================
- Coverage         62.76%   60.75%   -2.01%     
================================================
  Files               117      122       +5     
  Lines              6451     6824     +373     
================================================
+ Hits               4049     4146      +97     
- Misses             2402     2678     +276     
Impacted Files Coverage Δ
.../gnn/dgl_extensions/service_extensions/add_data.py 0.00% <0.00%> (ø)
.../gnn/dgl_extensions/service_extensions/sampling.py 0.00% <0.00%> (ø)
...ugraph/gnn/dgl_extensions/cugraph_service_store.py 17.79% <17.79%> (ø)
...thon/cugraph/cugraph/dask/sampling/random_walks.py 22.91% <22.91%> (ø)
...graph/cugraph/gnn/dgl_extensions/utils/add_data.py 25.00% <25.00%> (-62.50%) ⬇️
...raph/cugraph/gnn/dgl_extensions/feature_storage.py 48.78% <39.34%> (-30.71%) ⬇️
...ugraph/cugraph/gnn/dgl_extensions/cugraph_store.py 84.24% <55.55%> (-4.65%) ⬇️
...graph/cugraph/gnn/dgl_extensions/utils/sampling.py 87.17% <85.71%> (-0.44%) ⬇️
...ph/cugraph/gnn/dgl_extensions/utils/feature_map.py 87.50% <87.50%> (ø)
python/cugraph/cugraph/sampling/random_walks.py 92.42% <92.15%> (-4.13%) ⬇️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor things, looks fine to me otherwise.

"are more than one node types."
)
)
ntype = ntypes[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does ntype get used after this line?

Copy link
Member Author

@VibhuJawa VibhuJawa Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, Yup, this was a bug. I now pass it down here.

result = self.pg.get_vertex_data(
vertex_ids=indices, columns=self.columns, types=self.types_to_fetch
)
else:
result = self.pg.get_edge_data(
edge_ids=indices, columns=self.columns, types=self.types_to_fetch

We previously had a bug in PG which got fixed by this #2523 but i missed linking it so this slipped through.

I have added tests here:

def test_get_node_storage_ntypes():
node_ser = cudf.Series([1, 2, 3])
feat_ser = cudf.Series([1.0, 1.0, 1.0])
df = cudf.DataFrame({"node_ids": node_ser, "feat": feat_ser})
pg = PropertyGraph()
gs = CuGraphStore(pg, backend_lib="cupy")
gs.add_node_data(df, "node_ids", ntype="nt.a")
node_ser = cudf.Series([4, 5, 6])
feat_ser = cudf.Series([2.0, 2.0, 2.0])
df = cudf.DataFrame({"node_ids": node_ser, "feat": feat_ser})
gs.add_node_data(df, "node_ids", ntype="nt.b")
# All indices from a single ntype
output_ar = gs.get_node_storage(key="feat", ntype="nt.a").fetch([1, 2, 3])
cp.testing.assert_array_equal(cp.asarray([1, 1, 1], dtype=cp.float32), output_ar)
# Indices from other ntype are ignored
output_ar = gs.get_node_storage(key="feat", ntype="nt.b").fetch([1, 2, 5])
cp.testing.assert_array_equal(cp.asarray([2.0], dtype=cp.float32), output_ar)
def test_get_edge_storage_gs(dataset1_CuGraphStore):
etype = "('user', 'relationship', 'user')"
fs = dataset1_CuGraphStore.get_edge_storage("relationships_k", etype)
relationship_t = fs.fetch([6, 7, 8], device="cuda")
relationships_df = create_df_from_dataset(
dataset1["relationships"][0], dataset1["relationships"][1]
)
cudf_ar = relationships_df["relationship_type"].iloc[[0, 1, 2]].values
assert cp.allclose(cudf_ar, relationship_t)

)
)

etype = etypes[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does etype get used after this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests here, similar to ntype.

def test_get_edge_storage_gs(dataset1_CuGraphStore):
etype = "('user', 'relationship', 'user')"
fs = dataset1_CuGraphStore.get_edge_storage("relationships_k", etype)
relationship_t = fs.fetch([6, 7, 8], device="cuda")
relationships_df = create_df_from_dataset(
dataset1["relationships"][0], dataset1["relationships"][1]
)
cudf_ar = relationships_df["relationship_type"].iloc[[0, 1, 2]].values
assert cp.allclose(cudf_ar, relationship_t)

@VibhuJawa
Copy link
Member Author

@rlratzel , All reviews have been addressed. Please give this another look

python/cugraph/cugraph/gnn/__init__.py Show resolved Hide resolved

# TODO: Cant send dlpack or cupy arrays or numpys arrays
# through extensions
# Ask Rick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should create an issue for this and mention it here, remove "ask Rick"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved with bf49417

edge_data = self.__client.get_graph_edge_data(
id_or_ids=edge_ids or -1,
id_or_ids=ids,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for this change? There is no functional change here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws an error when you have array like objects . I dont like or when being used with objects that can be array like.

import numpy as np
np.asarray([1,2]) or -1
Traceback (most recent call last):
  File "main.py", line 2, in <module>
    np.asarray([1,2]) or -1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

@VibhuJawa
Copy link
Member Author

@alexbarghi-nv , @rlratzel , Addressed all requested reviews, let me know.

Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rlratzel
Copy link
Contributor

rlratzel commented Nov 9, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4bdc242 into rapidsai:branch-22.12 Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants