refactor(data): simplify remote backend `num_nodes` computation #5307

mananshah99 · 2022-08-29T18:02:54Z

This PR introduces remote_backend_utils (currently just a set of functions, open to making it a static class as well) that helps define common utilities to be used across remote backends. The first (and perhaps most useful) function included is num_nodes, which allows one to infer the number of nodes in a (node type, edge type) by leveraging attributes in a feature store and graph store. This significantly simplifies internal code and also simplifies some external interfaces as well.

…emote_backend_1

for more information, see https://pre-commit.ci

codecov · 2022-08-29T19:24:27Z

Codecov Report

Merging #5307 (5f8171a) into master (96fbf43) will increase coverage by 0.03%.
The diff coverage is 88.00%.

@@            Coverage Diff             @@
##           master    #5307      +/-   ##
==========================================
+ Coverage   83.33%   83.36%   +0.03%     
==========================================
  Files         337      338       +1     
  Lines       18641    18633       -8     
==========================================
- Hits        15535    15534       -1     
+ Misses       3106     3099       -7

Impacted Files	Coverage Δ
torch_geometric/data/data.py	`91.98% <ø> (+0.62%)`	⬆️
torch_geometric/data/hetero_data.py	`95.96% <ø> (+1.42%)`	⬆️
torch_geometric/data/lightning_datamodule.py	`48.82% <ø> (ø)`
torch_geometric/testing/graph_store.py	`100.00% <ø> (ø)`
torch_geometric/loader/neighbor_loader.py	`94.77% <75.00%> (-0.17%)`	⬇️
torch_geometric/data/remote_backend_utils.py	`84.84% <84.84%> (ø)`
torch_geometric/data/feature_store.py	`88.81% <100.00%> (+0.65%)`	⬆️
torch_geometric/data/graph_store.py	`92.85% <100.00%> (+0.72%)`	⬆️
torch_geometric/loader/link_neighbor_loader.py	`94.47% <100.00%> (-0.24%)`	⬇️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

yaoyaowd · 2022-08-29T22:23:35Z

torch_geometric/data/remote_backend.py

+from torch_geometric.typing import EdgeType, NodeType
+
+
+def num_nodes(


I actually like the previous implementation for each feature store and graph store better. That is more intuitive to me compare with get all stores and query the nodes with a query (which sounds more heavy weight).
Could we just expose num_nodes and sizes for features and edges ?

Sure, we can expose num nodes for features and edges, but there are instances where an edge index is not part of a graph (e.g. link prediction) or we just want to get the number of nodes from a group name, not a full edge index type. I think it improves simplicity if we have num_nodes as a common interface for this reason. I also don't think it's too heavyweight to query all the edge attributes and node attributes, since these are small data structures and are somewhat bounded.

However, if you think this overhead would be significant, we can expose methods in the feature store and graph store to get a TensorAttr/EdgeAttr from a partial specification, and directly query these (instead of listing all of them and iterating). This can be done as an improvement in a separate PR. Wdyt?

After a second look, I think the diverge comes from whether we want to introduce this new remote_backend or consolidate logic in GraphStore, i.e we can consider GraphStore to be a backend that can be remote. And then have an api like

class GraphStore: def num_nodes(Union[NodeType, EdgeType]):

We can also hide FeatureStore behind GraphStore or FeatureStore metadata as part of information in GraphStore.

If we want to introduce remote_backend, you may want to think more about whether it is a Backend object or it is a helper provide util functions talk to FeatureStore and GraphStore.

A few clarifications

Yes, a GraphStore can be remote.

We do not want to hide FeatureStore behind GraphStore; ideally, a feature store owns features, and a graph store owns the graph. The complications come because there can be edges in the graph store with no corresponding features in the feature store, etc. as mentioned above.

remote_backend is not a Backend object; it's really a helper providing utility functions to talk to FeatureStore and GraphStore (as in the comment at the top of remote_backend.py). However, I agree that the name is confusing. I can change it to remote_backend_util.py.

torch_geometric/data/remote_backend.py

torch_geometric/loader/link_neighbor_loader.py

torch_geometric/data/remote_backend.py

…ric into remote_backend_1

torch_geometric/loader/link_neighbor_loader.py

test/data/test_remote_backend.py

torch_geometric/data/remote_backend_utils.py

rusty1s · 2022-08-30T12:05:28Z

torch_geometric/data/remote_backend_utils.py

+
+    # 1. Check GraphStore:
+    edge_attrs = graph_store.get_all_edge_attrs()
+    for edge_attr in edge_attrs:


Any opinion on avoiding the for-loop here? Isn't this an implementation detail how the GraphStore/FeatureStore save the edge attributes? If it is done as part of some hash map, we should be able to leverage this.

Yeah, I think graph store and feature store should expose (optional) methods to obtain a tensorattr/edgeattr from the first member of the corresponding dataclass. But wanted to leave that for another PR.

mananshah99 and others added 6 commits August 29, 2022 18:01

init

0b976ee

Merge branch 'master' of github.com:pyg-team/pytorch_geometric into r…

015743e

…emote_backend_1

update

b111754

[pre-commit.ci] auto fixes from pre-commit.com hooks

9a36b3a

for more information, see https://pre-commit.ci

update

f23586b

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c35cd6

for more information, see https://pre-commit.ci

mananshah99 changed the title ~~draft: remote backend utilities~~ refactor(data): simplify remote backend num_nodes computation Aug 29, 2022

update

6545ced

mananshah99 marked this pull request as ready for review August 29, 2022 19:28

mananshah99 added 3 commits August 29, 2022 19:33

more cleanup

aebf0d5

update

6741272

fix

aacc361

rusty1s assigned mananshah99 Aug 29, 2022

rusty1s added 0 - Priority P0 refactor loader labels Aug 29, 2022

mananshah99 requested review from rusty1s and yaoyaowd August 29, 2022 20:11

yaoyaowd reviewed Aug 29, 2022

View reviewed changes

wsad1 reviewed Aug 30, 2022

View reviewed changes

mananshah99 added 4 commits August 30, 2022 06:57

update, rename

db02f60

Merge branch 'master' into remote_backend_1

aaa719c

flake8

bb2f284

Merge branch 'remote_backend_1' of github.com:pyg-team/pytorch_geomet…

252466a

…ric into remote_backend_1

wsad1 approved these changes Aug 30, 2022

View reviewed changes

rusty1s approved these changes Aug 30, 2022

View reviewed changes

mananshah99 added 2 commits August 30, 2022 22:18

updates

6a637b5

Merge branch 'master' into remote_backend_1

5f8171a

mananshah99 merged commit be471ee into master Aug 30, 2022

mananshah99 deleted the remote_backend_1 branch August 30, 2022 22:26

mananshah99 mentioned this pull request Sep 20, 2022

[Roadmap] Remote Backend Support and Integration 🚀 #4806

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(data): simplify remote backend `num_nodes` computation #5307

refactor(data): simplify remote backend `num_nodes` computation #5307

mananshah99 commented Aug 29, 2022 •

edited

Loading

codecov bot commented Aug 29, 2022 •

edited

Loading

yaoyaowd Aug 29, 2022

mananshah99 Aug 29, 2022 •

edited

Loading

yaoyaowd Aug 29, 2022

mananshah99 Aug 30, 2022

rusty1s Aug 30, 2022

mananshah99 Aug 30, 2022

		from torch_geometric.typing import EdgeType, NodeType


		def num_nodes(

refactor(data): simplify remote backend num_nodes computation #5307

refactor(data): simplify remote backend num_nodes computation #5307

Conversation

mananshah99 commented Aug 29, 2022 • edited Loading

codecov bot commented Aug 29, 2022 • edited Loading

Codecov Report

yaoyaowd Aug 29, 2022

Choose a reason for hiding this comment

mananshah99 Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

yaoyaowd Aug 29, 2022

Choose a reason for hiding this comment

mananshah99 Aug 30, 2022

Choose a reason for hiding this comment

rusty1s Aug 30, 2022

Choose a reason for hiding this comment

mananshah99 Aug 30, 2022

Choose a reason for hiding this comment

refactor(data): simplify remote backend `num_nodes` computation #5307

refactor(data): simplify remote backend `num_nodes` computation #5307

mananshah99 commented Aug 29, 2022 •

edited

Loading

codecov bot commented Aug 29, 2022 •

edited

Loading

mananshah99 Aug 29, 2022 •

edited

Loading