Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuGraph+PyG Wrappers and Loaders #2567

Merged
merged 84 commits into from
Aug 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
93f6ff9
Address CuGraphStorage reviews
VibhuJawa Jun 30, 2022
71824bd
plc graph creation
alexbarghi-nv Jul 7, 2022
7a31e80
remove useless files
alexbarghi-nv Jul 7, 2022
4203728
fix style
alexbarghi-nv Jul 7, 2022
c25d197
remove whitespace
alexbarghi-nv Jul 7, 2022
51c2f98
add tests
alexbarghi-nv Jul 7, 2022
05bac91
remove cufile
alexbarghi-nv Jul 7, 2022
31b1715
add newlines
alexbarghi-nv Jul 7, 2022
1ce7fe3
test fixes
alexbarghi-nv Jul 8, 2022
17e12fd
working ndata,edata with all the updates
VibhuJawa Jul 8, 2022
71eae74
fixed style checks
VibhuJawa Jul 8, 2022
052b635
style
alexbarghi-nv Jul 8, 2022
3b51278
style
alexbarghi-nv Jul 8, 2022
5a938eb
add type checking for weights
alexbarghi-nv Jul 8, 2022
12ebf68
Add CuGraphStore tests
VibhuJawa Jul 8, 2022
d70e43f
Fixed Style
VibhuJawa Jul 8, 2022
739b252
add sg bfs, fix arg in mg bfs
alexbarghi-nv Jul 11, 2022
e3e1038
remove useless files
alexbarghi-nv Jul 11, 2022
7d7fad4
neighbor sampling using stored plc graph
alexbarghi-nv Jul 11, 2022
d95fac7
style
alexbarghi-nv Jul 11, 2022
81dcb07
style
alexbarghi-nv Jul 11, 2022
80b9041
Added graph_store.py and test_graph_store.py
VibhuJawa Jul 11, 2022
834f8a8
stop hanging for mg bfs, sampling
alexbarghi-nv Jul 11, 2022
568cc86
style
alexbarghi-nv Jul 11, 2022
18b3515
cugraph
alexbarghi-nv Jul 11, 2022
5df2d63
fix pytorch import error
VibhuJawa Jul 13, 2022
c6d1fbf
Fix to(device) for cupy
VibhuJawa Jul 13, 2022
66aa800
minor
alexbarghi-nv Jul 15, 2022
3324b81
remove cufile
alexbarghi-nv Jul 15, 2022
5b795bf
Merge branch 'address_CuGraphStorage' of https://github.com/vibhujawa…
alexbarghi-nv Jul 15, 2022
89bf8e8
Merge branch 'branch-22.08' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Jul 15, 2022
0f4e3a8
add cugraph pyg code
alexbarghi-nv Jul 15, 2022
0913be8
Added options to extract_subgraph() to bypass renumbering and adding …
rlratzel Jul 15, 2022
b7b7a7d
flake8 fixes.
rlratzel Jul 15, 2022
9998a95
Added code and tests for PG.num_vertices_with_properties attribute, w…
rlratzel Jul 16, 2022
eb7d928
Added code and test for handling no vertex data when accessing num_ve…
rlratzel Jul 16, 2022
5ee2ddd
Merge branch 'branch-22.08-pg_updates_for_gnns' of https://github.com…
alexbarghi-nv Jul 18, 2022
b487909
cugraph-pyg
alexbarghi-nv Jul 21, 2022
2ce5bcc
blergh
alexbarghi-nv Jul 21, 2022
8422def
commit changes
alexbarghi-nv Jul 22, 2022
d46570b
minor
alexbarghi-nv Jul 27, 2022
c99dd98
Fix merge conflict
alexbarghi-nv Jul 27, 2022
37d08bb
Merge branch 'branch-22.08' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Aug 1, 2022
f6fd431
fix non-deterministic bug in uniform neighborhood sampling (bad memor…
ChuckHastings Aug 2, 2022
1aa0e2d
fix clang-format issues
ChuckHastings Aug 2, 2022
9152d2e
Merge branch 'fix_uniform_neighborhood_sample_bug' into cugraph-pyg
alexbarghi-nv Aug 2, 2022
928dd5f
clean up and add graph sage example notebook
alexbarghi-nv Aug 3, 2022
f5ab405
remove garbage from notebook
alexbarghi-nv Aug 3, 2022
0bbf2c8
Merge branch 'branch-22.08' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Aug 11, 2022
590141d
updates, mg, hetero
alexbarghi-nv Aug 11, 2022
1e4bde0
remove old api classes
alexbarghi-nv Aug 11, 2022
87ec8a4
Merge branch 'branch-22.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Aug 11, 2022
4e5ac2d
more fixes and cleanup
alexbarghi-nv Aug 11, 2022
c0ed9a7
move notebook to correct location
alexbarghi-nv Aug 11, 2022
f7b8362
update notebook
alexbarghi-nv Aug 11, 2022
8b1ec2d
remove cudf storage
alexbarghi-nv Aug 11, 2022
ffb1276
unit tests for sg
alexbarghi-nv Aug 16, 2022
1d60e8d
remove garbage file
alexbarghi-nv Aug 16, 2022
9198076
revert yaml change
alexbarghi-nv Aug 16, 2022
a0b0ca2
revert egonet change
alexbarghi-nv Aug 16, 2022
411d1ea
add experimental warnings to loaders
alexbarghi-nv Aug 16, 2022
9f568c3
test fixes
alexbarghi-nv Aug 17, 2022
7c1ad11
fix style
alexbarghi-nv Aug 17, 2022
5a1a5ac
add the sg sampling fix
alexbarghi-nv Aug 17, 2022
6e81cd8
rename functions
alexbarghi-nv Aug 17, 2022
cb086db
style fix
alexbarghi-nv Aug 17, 2022
736b633
fixes for graph sage
alexbarghi-nv Aug 17, 2022
855c9fa
Drop __init__ and add __post_init__ to simplify constructor for CuGra…
alexbarghi-nv Aug 24, 2022
c3292c5
Change assert statement to proper error message.
alexbarghi-nv Aug 24, 2022
582ce09
clean up notebook and disable tests
alexbarghi-nv Aug 24, 2022
b97e381
Merge branch 'cugraph-pyg-new-api' of https://github.com/alexbarghi-n…
alexbarghi-nv Aug 24, 2022
2049d79
remove hardcoded type feature
alexbarghi-nv Aug 24, 2022
3410e9f
make backend private
alexbarghi-nv Aug 24, 2022
f9371fb
Merge branch 'cugraph-pyg-new-api' of https://github.com/alexbarghi-n…
alexbarghi-nv Aug 24, 2022
93ac58a
simplify uniform_neighbor_sample call
alexbarghi-nv Aug 24, 2022
68bd410
add additional comments for _get_edge_index
alexbarghi-nv Aug 24, 2022
bb5c15e
Merge branch 'cugraph-pyg-new-api' of https://github.com/alexbarghi-n…
alexbarghi-nv Aug 24, 2022
3386ba6
add edge type check for single edge optimized code
alexbarghi-nv Aug 24, 2022
ecbe7f6
remove the fake groupby in favor of a future better solution
alexbarghi-nv Aug 24, 2022
5a5c02d
don't access the __dict__
alexbarghi-nv Aug 24, 2022
e79e172
style fix
alexbarghi-nv Aug 24, 2022
a59aa20
Merge branch 'cugraph-pyg-new-api' of https://github.com/alexbarghi-n…
alexbarghi-nv Aug 24, 2022
ce00bc1
set to list
alexbarghi-nv Aug 25, 2022
cdcfa0d
change to cupy.arange
alexbarghi-nv Aug 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
368 changes: 368 additions & 0 deletions notebooks/gnn/pyg_hetero_mag.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyG+cuGraph Heterogeneous MAG Example\n",
"# Skip notebook test\n",
"\n",
"### Requires installation of PyG"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"import rmm\n",
"\n",
"rmm.reinitialize(pool_allocator=True,initial_pool_size=5e+9, maximum_pool_size=20e+9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load MAG into CPU Memory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cugraph\n",
"import cudf\n",
"from ogb.nodeproppred import NodePropPredDataset\n",
"\n",
"dataset = NodePropPredDataset(name = 'ogbn-mag') \n",
"\n",
"data = dataset[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create PropertyGraph from MAG Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Partially Load the Vertex Data (just ids)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cudf\n",
"import dask_cudf\n",
"import cugraph\n",
"from cugraph.experimental import MGPropertyGraph\n",
"from cugraph.experimental import PropertyGraph\n",
"pG = PropertyGraph()\n",
"\n",
"vertex_offsets = {}\n",
"last_offset = 0\n",
"\n",
"for node_type, num_nodes in data[0]['num_nodes_dict'].items():\n",
" vertex_offsets[node_type] = last_offset\n",
" last_offset += num_nodes\n",
" \n",
" blank_df = cudf.DataFrame({'id':range(vertex_offsets[node_type], vertex_offsets[node_type] + num_nodes)})\n",
" blank_df.id = blank_df.id.astype('int32')\n",
" if isinstance(pG, MGPropertyGraph):\n",
" blank_df = dask_cudf.from_cudf(blank_df, npartitions=2)\n",
" pG.add_vertex_data(blank_df, vertex_col_name='id', type_name=node_type)\n",
"\n",
"vertex_offsets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add the Remaining Node Features"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i, (node_type, node_features) in enumerate(data[0]['node_feat_dict'].items()):\n",
" vertex_offset = vertex_offsets[node_type]\n",
"\n",
" feature_df = cudf.DataFrame(node_features)\n",
" feature_df.columns = [str(c) for c in range(feature_df.shape[1])]\n",
" feature_df['id'] = range(vertex_offset, vertex_offset + node_features.shape[0])\n",
" feature_df.id = feature_df.id.astype('int32')\n",
" if isinstance(pG, MGPropertyGraph):\n",
" feature_df = dask_cudf.from_cudf(feature_df, npartitions=2)\n",
"\n",
" pG.add_vertex_data(feature_df, vertex_col_name='id', type_name=node_type)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add the Edges"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i, (edge_key, eidx) in enumerate(data[0]['edge_index_dict'].items()):\n",
" node_type_src, edge_type, node_type_dst = edge_key\n",
" print(node_type_src, edge_type, node_type_dst)\n",
" vertex_offset_src = vertex_offsets[node_type_src]\n",
" vertex_offset_dst = vertex_offsets[node_type_dst]\n",
" eidx = [n + vertex_offset_src for n in eidx[0]], [n + vertex_offset_dst for n in eidx[1]]\n",
"\n",
" edge_df = cudf.DataFrame({'src':eidx[0], 'dst':eidx[1]})\n",
" edge_df.src = edge_df.src.astype('int32')\n",
" edge_df.dst = edge_df.dst.astype('int32')\n",
" edge_df['type'] = edge_type\n",
" if isinstance(pG, MGPropertyGraph):\n",
" edge_df = dask_cudf.from_cudf(edge_df, npartitions=2)\n",
"\n",
" # Adding backwards edges is currently required in both the cuGraph PG and PyG APIs.\n",
" pG.add_edge_data(edge_df, vertex_col_names=['src','dst'], type_name=edge_type)\n",
" pG.add_edge_data(edge_df, vertex_col_names=['dst','src'], type_name=f'{edge_type}_bw')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add the Target Variable"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_df = cudf.DataFrame(data[1]['paper'], columns=['y'])\n",
"y_df['id'] = range(vertex_offsets['paper'], vertex_offsets['paper'] + len(y_df))\n",
"y_df.id = y_df.id.astype('int32')\n",
"if isinstance(pG, MGPropertyGraph):\n",
" y_df = dask_cudf.from_cudf(y_df, npartitions=2)\n",
"\n",
"pG.add_vertex_data(y_df, vertex_col_name='id', type_name='paper')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Construct a Graph Store, Feature Store, and Loaders"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cugraph.gnn.pyg_extensions.data.cugraph_store import to_pyg\n",
"\n",
"feature_store, graph_store = to_pyg(pG)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cugraph.gnn.pyg_extensions import CuGraphLinkNeighborLoader\n",
"loader = CuGraphLinkNeighborLoader(\n",
" data=(feature_store, graph_store),\n",
" edge_label_index='writes',\n",
" shuffle=True,\n",
" num_neighbors=[10,25],\n",
" batch_size=50,\n",
")\n",
"\n",
"test_loader = CuGraphLinkNeighborLoader(\n",
" data=(feature_store, graph_store),\n",
" edge_label_index='writes',\n",
" shuffle=True,\n",
" num_neighbors=[10,25],\n",
" batch_size=50,\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create the Network"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"edge_types = [attr.edge_type for attr in graph_store.get_all_edge_attrs()]\n",
"edge_types"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_classes = pG.get_vertex_data(columns=['y'])['y'].max() + 1\n",
"if isinstance(pG, MGPropertyGraph):\n",
" num_classes = num_classes.compute()\n",
"num_classes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn.functional as F\n",
"\n",
"from torch_geometric.nn import HeteroConv, Linear, SAGEConv\n",
"\n",
"class HeteroGNN(torch.nn.Module):\n",
" def __init__(self, edge_types, hidden_channels, out_channels, num_layers):\n",
" super().__init__()\n",
"\n",
" self.convs = torch.nn.ModuleList()\n",
" for _ in range(num_layers):\n",
" conv = HeteroConv({\n",
" edge_type: SAGEConv((-1, -1), hidden_channels)\n",
" for edge_type in edge_types\n",
" })\n",
" self.convs.append(conv)\n",
"\n",
" self.lin = Linear(hidden_channels, out_channels)\n",
"\n",
" def forward(self, x_dict, edge_index_dict):\n",
" for conv in self.convs:\n",
" x_dict = conv(x_dict, edge_index_dict)\n",
" x_dict = {key: F.leaky_relu(x) for key, x in x_dict.items()}\n",
" print(x_dict, edge_index_dict)\n",
" return self.lin(x_dict['paper'])\n",
"\n",
"\n",
"model = HeteroGNN(edge_types, hidden_channels=64, out_channels=num_classes,\n",
" num_layers=2).cuda()\n",
"\n",
"with torch.no_grad(): # Initialize lazy modules.\n",
" data = next(iter(loader))\n",
" out = model(data.x_dict, data.edge_index_dict)\n",
"\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=0.001)\n",
"\n",
"num_batches = 5\n",
"def train():\n",
" model.train()\n",
" optimizer.zero_grad()\n",
" for b_i, data in enumerate(loader):\n",
" if b_i == num_batches:\n",
" break\n",
"\n",
" out = model(data.x_dict, data.edge_index_dict)\n",
" loss = F.cross_entropy(out, data.y_dict['paper'])\n",
" loss.backward()\n",
" optimizer.step()\n",
" \n",
" return float(loss) / num_batches\n",
"\n",
"\n",
"@torch.no_grad()\n",
"def test():\n",
" model.eval()\n",
" test_iter = iter(test_loader)\n",
"\n",
" acc = 0.0\n",
" for _ in range(2*num_batches):\n",
" data = next(test_iter)\n",
" pred = model(data.x_dict, data.edge_index_dict).argmax(dim=-1)\n",
"\n",
" \n",
" acc += (pred == data['paper'].y).sum() / len(data['paper'])\n",
" return acc / (2*num_batches)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train the Network"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for epoch in range(1, 101):\n",
" loss = train()\n",
" train_acc = test()\n",
" print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train: {train_acc:.4f}')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.7 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,10 @@ def uniform_neighbor_sample(input_graph,
raise TypeError("fanout_vals must be a list, "
f"got: {type(fanout_vals)}")

weight_t = input_graph.edgelist.edgelist_df["value"].dtype
if 'value' in input_graph.edgelist.edgelist_df:
weight_t = input_graph.edgelist.edgelist_df["value"].dtype
else:
weight_t = 'float32'

# start_list uses "external" vertex IDs, but if the graph has been
# renumbered, the start vertex IDs must also be renumbered.
Expand Down
14 changes: 14 additions & 0 deletions python/cugraph/cugraph/gnn/pyg_extensions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright (c) 2019-2022, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from cugraph.gnn.pyg_extensions.data import to_pyg
Loading