-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] Remote Backend Support and Integration 🚀 #4806
Comments
FeatureStore
Integration 🚀
I think we should add this point to the roadmap
This will help us "test" the interface, and also demonstrate how people can build concrete |
Also we could add
|
Yes, @wsad1. I think these are good points. One thing we could do to showcase is to have a short example/tutorial on how to connect to a neo4j graph database or similar. |
Can we also add some clean up tasks here? For example, relying more on |
FeatureStore
Integration 🚀FeatureStore
and GraphStore
Integration 🚀
@mananshah99 just interested what you're planning for
|
(also I slightly updated the description to link to graphstore, hope you don't mind) |
FeatureStore
and GraphStore
Integration 🚀
Hi folks, this roadmap has been updated a bit to describe latest changes and a few potential further directions (cc @Padarn, I hope this helps address some of your questions as well). Feel free to add on, or let me know if you have any questions/comments/concerns! |
Hi team, I wonder if current remote backend can support edge features. It would be great if we can access edge features such as mult-iclass labels in the remote resources such as DBs. |
cc @mananshah99 |
I love seeing Ray and Neo4j on these items! 😄 Are there any updates on these items? I don't see anything listed in the repo or in the docs. I saw an example of using Kuzu for a Remote GraphStore (via Thanks in advance! |
Motivation
PyG currently requires users to store graphs (and associated node + edge features) in
Data
andHeteroData
objects, which are accepted by loaders to run forward/backward passes on an accelerator of choice. This abstraction, however, does not scale to large graphs (or large feature tensors), which can quickly oversubscribe CPU DRAM (despite the GPU VRAM requirements only being the memory consumption of each sampled subgraph and its associated node and edge features). Indeed, one can imagine storing graph features (and the graph itself) in "remote backends", which provide fixed operators that can be used to integrate cleanly with downstream PyG samplers and loaders.The goal of this roadmap is to track the integration of native remote backend support into PyG. At a high level, this will be accomplished through the feature store, graph store, and sampler abstractions into PyG. For more freeform discussion, please visit the #scalability channel in the PyG Slack community.
Implementation
Abstractions:
FeatureStore
,GraphStore
,Sampler
Data
andHeteroData
implement theFeatureStore
abstraction (LetData
andHeteroData
implementFeatureStore
#4807)GraphStore
abstraction that is intended to hold anedge_index
in memory (GraphStore
definition +Data
andHeteroData
integration #4816)Data
andHeteroData
implement theGraphStore
abstraction (GraphStore
definition +Data
andHeteroData
integration #4816)NeighborLoader
to callFeatureStore
andGraphStore
methods instead of theirData/HeteroData
counterparts. Note that this will require moving filtering of data into the feature store. The new interface will look likedata: Union[Union[Data, HeteroData], Tuple[FeatureStore, GraphStore]]
(LetNeighborLoader
acceptTuple[FeatureStore, GraphStore]
#4817,GraphStore
: supportCOO
layouts, refactor conversion logic #4883)BaseSampler
and refactor existing samplers behind a common interface (refactor(sampler): consolidate sampling interface, part 1 #5312, refactor(sampler): consolidate link neighbor sampling interface, part 2 #5365, refactor(sampler): clean up sampler attributes, part 3 #5402)NodeLoader
andLinkLoader
, refactor existing loaders behind loader + sampler interface (refactor(loader):NodeLoader
andLinkLoader
as base implementation classes, part 4 #5404, refactor(loader):HGTLoader
behindNodeLoader
interface, part 5 #5418)TensorAttr
orEdgeAttr
from aFeatureStore
/GraphStore
from their first dataclass attribute, and refactor existing computations that get all (tensors, edges) and subsequently filter to use these methods.LightningNodeData
andLightningLinkData
Implementations
FeatureStore
,GraphStore
, andSampler
with a popular backend to provide example usage. Some thoughts here include a RayRandomAccessDataset
for a feature store and a Neo4j graph for a graph store.Tuple[FeatureStore, GraphStore]
to perform basic sanity checks (in a similar way thatData
andHeteroData
do today)HGTSampler
torch_geometric/loader
(e.g. GraphSAINT, ShaDow) behind the sampler interface, enabling (a) easy extension to sampling from edges and (b) ease of extension to reote backedns in the future.Code Health
num_nodes
computation #5307)Data
,HeteroData
, andTuple[FeatureStore, GraphStore]
throughout the PyG codebase into a single conditional. This should be possible since bothData
andHeteroData
areFeatureStore
andGraphStore
sThe text was updated successfully, but these errors were encountered: