[Roadmap] Remote Backend Support and Integration 🚀 #4806

mananshah99 · 2022-06-15T00:34:59Z

Motivation

PyG currently requires users to store graphs (and associated node + edge features) in Data and HeteroData objects, which are accepted by loaders to run forward/backward passes on an accelerator of choice. This abstraction, however, does not scale to large graphs (or large feature tensors), which can quickly oversubscribe CPU DRAM (despite the GPU VRAM requirements only being the memory consumption of each sampled subgraph and its associated node and edge features). Indeed, one can imagine storing graph features (and the graph itself) in "remote backends", which provide fixed operators that can be used to integrate cleanly with downstream PyG samplers and loaders.

The goal of this roadmap is to track the integration of native remote backend support into PyG. At a high level, this will be accomplished through the feature store, graph store, and sampler abstractions into PyG. For more freeform discussion, please visit the #scalability channel in the PyG Slack community.

Implementation

Abstractions: `FeatureStore`, `GraphStore`, `Sampler`

Let Data and HeteroData implement the FeatureStore abstraction (Let Data and HeteroData implement FeatureStore #4807)
Define a GraphStore abstraction that is intended to hold an edge_index in memory (GraphStore definition + Data and HeteroData integration #4816)
Let Data and HeteroData implement the GraphStore abstraction (GraphStore definition + Data and HeteroData integration #4816)
Modify NeighborLoader to call FeatureStore and GraphStore methods instead of their Data/HeteroData counterparts. Note that this will require moving filtering of data into the feature store. The new interface will look like data: Union[Union[Data, HeteroData], Tuple[FeatureStore, GraphStore]] (Let NeighborLoader accept Tuple[FeatureStore, GraphStore] #4817, GraphStore: support COO layouts, refactor conversion logic #4883)
Implement BaseSampler and refactor existing samplers behind a common interface (refactor(sampler): consolidate sampling interface, part 1 #5312, refactor(sampler): consolidate link neighbor sampling interface, part 2 #5365, refactor(sampler): clean up sampler attributes, part 3 #5402)
Introduce NodeLoader and LinkLoader, refactor existing loaders behind loader + sampler interface (refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 4 #5404, refactor(loader): HGTLoader behind NodeLoader interface, part 5 #5418)
Support (optional) methods to obtain a TensorAttr or EdgeAttr from a FeatureStore/GraphStore from their first dataclass attribute, and refactor existing computations that get all (tensors, edges) and subsequently filter to use these methods.
Support variable samplers in LightningNodeData and LightningLinkData

Implementations

Implement a concrete FeatureStore, GraphStore, and Sampler with a popular backend to provide example usage. Some thoughts here include a Ray RandomAccessDataset for a feature store and a Neo4j graph for a graph store.
Implement a validation class that operates on Tuple[FeatureStore, GraphStore] to perform basic sanity checks (in a similar way that Data and HeteroData do today)
Implement sampling from edges in the HGTSampler
Implement (to the extent possible) samplers in torch_geometric/loader (e.g. GraphSAINT, ShaDow) behind the sampler interface, enabling (a) easy extension to sampling from edges and (b) ease of extension to reote backedns in the future.

Code Health

Implement a remote backend utility class to consolidate common methods across feature and graph stores (refactor(data): simplify remote backend num_nodes computation #5307)
Consolidate conditionals for Data, HeteroData, and Tuple[FeatureStore, GraphStore] throughout the PyG codebase into a single conditional. This should be possible since both Data and HeteroData are FeatureStore and GraphStores

The text was updated successfully, but these errors were encountered:

wsad1 · 2022-06-18T06:45:50Z

I think we should add this point to the roadmap

Implement a concrete FeatureStore using some "popular" storage backend.

This will help us "test" the interface, and also demonstrate how people can build concrete FeatureStores. WDYT?

wsad1 · 2022-06-18T09:31:36Z

Also we could add

Since FeatureStore and MaterializedGraph are independent. It would be nice to have validate(FeatureStore, MaterializedGraph) which checks things like 1. MaterializedGraph only connects node_type present in FeatureStore 2. max(edge_index) is bounded by number of nodes in FeatureStore.

Validate will mostly be a abstract class, with implementations over riding __call__(FeatureStore, MaterializedGraph).

rusty1s · 2022-06-19T11:12:29Z

Yes, @wsad1. I think these are good points. One thing we could do to showcase is to have a short example/tutorial on how to connect to a neo4j graph database or similar.

rusty1s · 2022-06-19T16:20:33Z

Can we also add some clean up tasks here? For example, relying more on FeatureStore and MaterializedGraph interfaces than BaseData.

mananshah99 · 2022-06-20T01:11:41Z

@rusty1s @wsad1 thanks for those suggestions, agreed on both fronts. Will incorporate tomorrow :)

Padarn · 2022-07-03T07:38:04Z

@mananshah99 just interested what you're planning for

Implement a concrete FeatureStore and GraphStore with a popular backend to provide example usage
what backend are you thinking of supporting.

Padarn · 2022-07-03T08:03:08Z

(also I slightly updated the description to link to graphstore, hope you don't mind)

mananshah99 · 2022-09-20T23:20:40Z

Hi folks, this roadmap has been updated a bit to describe latest changes and a few potential further directions (cc @Padarn, I hope this helps address some of your questions as well). Feel free to add on, or let me know if you have any questions/comments/concerns!

Derek-Wds · 2023-08-28T20:06:50Z

Hi team, I wonder if current remote backend can support edge features. It would be great if we can access edge features such as mult-iclass labels in the remote resources such as DBs.

rusty1s · 2023-08-30T09:29:26Z

cc @mananshah99

AlexMRuch · 2024-01-08T21:26:48Z

I love seeing Ray and Neo4j on these items! 😄

Are there any updates on these items? I don't see anything listed in the repo or in the docs.

I saw an example of using Kuzu for a Remote GraphStore (via feature_store, graph_store = db.get_torch_geometric_remote_backend(mp.cpu_count())) but not anything for Neo4j (except https://neo4j.com/docs/graph-data-science-client/current/tutorials/import-sample-export-gnn/, which is more complicated than the Kuzu counterpart).

Thanks in advance!

mananshah99 added the feature label Jun 15, 2022

mananshah99 self-assigned this Jun 15, 2022

rusty1s added 0 - Priority P0 data labels Jun 15, 2022

rusty1s changed the title ~~[Roadmap] Feature Store Integration~~ [Roadmap] Feature Store Integration 🚀 Jun 15, 2022

rusty1s changed the title ~~[Roadmap] Feature Store Integration 🚀~~ [Roadmap] FeatureStore Integration 🚀 Jun 15, 2022

mananshah99 changed the title ~~[Roadmap] FeatureStore Integration 🚀~~ [Roadmap] FeatureStore and GraphStore Integration 🚀 Jun 28, 2022

rusty1s added roadmap and removed feature labels Aug 8, 2022

mananshah99 changed the title ~~[Roadmap] FeatureStore and GraphStore Integration 🚀~~ [Roadmap] Remote Backend Support and Integration 🚀 Sep 20, 2022

ivaylobah pinned this issue Sep 26, 2022

ray6080 mentioned this issue Feb 14, 2023

v0.0.3 Roadmap kuzudb/kuzu#1286

Closed

13 tasks

rusty1s unpinned this issue Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] Remote Backend Support and Integration 🚀 #4806

[Roadmap] Remote Backend Support and Integration 🚀 #4806

mananshah99 commented Jun 15, 2022 •

edited

Loading

wsad1 commented Jun 18, 2022 •

edited

Loading

wsad1 commented Jun 18, 2022

rusty1s commented Jun 19, 2022

rusty1s commented Jun 19, 2022

mananshah99 commented Jun 20, 2022

Padarn commented Jul 3, 2022

Padarn commented Jul 3, 2022

mananshah99 commented Sep 20, 2022

Derek-Wds commented Aug 28, 2023

rusty1s commented Aug 30, 2023

AlexMRuch commented Jan 8, 2024

[Roadmap] Remote Backend Support and Integration 🚀 #4806

[Roadmap] Remote Backend Support and Integration 🚀 #4806

Comments

mananshah99 commented Jun 15, 2022 • edited Loading

Motivation

Implementation

Abstractions: FeatureStore, GraphStore, Sampler

Implementations

Code Health

wsad1 commented Jun 18, 2022 • edited Loading

wsad1 commented Jun 18, 2022

rusty1s commented Jun 19, 2022

rusty1s commented Jun 19, 2022

mananshah99 commented Jun 20, 2022

Padarn commented Jul 3, 2022

Padarn commented Jul 3, 2022

mananshah99 commented Sep 20, 2022

Derek-Wds commented Aug 28, 2023

rusty1s commented Aug 30, 2023

AlexMRuch commented Jan 8, 2024

mananshah99 commented Jun 15, 2022 •

edited

Loading

Abstractions: `FeatureStore`, `GraphStore`, `Sampler`

wsad1 commented Jun 18, 2022 •

edited

Loading