-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(loader): NodeLoader
and LinkLoader
as base implementation classes, part 4
#5404
Conversation
…ric into remote_backend_4
for more information, see https://pre-commit.ci
NodeLoader
and LinkLoader
as base implementation classes, part 5NodeLoader
and LinkLoader
as base implementation classes, part 4
Codecov Report
@@ Coverage Diff @@
## master #5404 +/- ##
==========================================
+ Coverage 83.35% 83.36% +0.01%
==========================================
Files 343 345 +2
Lines 18802 18821 +19
==========================================
+ Hits 15672 15690 +18
- Misses 3130 3131 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
# Initialize sampler with keyword arguments: | ||
# NOTE sampler is an attribute of 'DataLoader': | ||
self.node_sampler = node_sampler | ||
if initialize_sampler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case for not doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, see LightningNodeData
and LightningLinkData
for examples of this (we initialize a sampler once, and use it for multiple dataloaders).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see. Lightning
always gives caes I hadn't thought of. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me. I feel there is some over-complication with sampler
and sampler_kwargs
but happy to defer to you if you feel strongly about it.
…5418) This PR builds off of #5404 by refactoring `HGTLoader` behind the sampler and loader interface; the simplicity of this refactor also shows the flexibility of the interface. In doing so, it defines `HGTSampler` that inherits from `BaseSampler`, and uses `HGTSampler` as part of the `HGTLoader(NodeLoader)` construction. With this setup, it should also be trivial to add an HGT link-level loader, but this task is left as a TODO.
This PR continues the effort to consolidate PyG's sampling interface in preparation for moving
sample(...)
behind theGraphStore
interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. It builds off of #5402, and makes a significant move to abstract data loading behind adata: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]]
and asampler: BaseSampler
.It does so by introducing two base implementation classes:
NodeLoader
andLinkLoader
.NodeLoader
performs sampling from nodes (usingsample_from_nodes
), andLinkLoader
does the same from edges (usingsample_from_edges
). They both expose parameters in their initializers that are intended for loading (that is, the process of using a sampler to get subgraphs, using a feature fetcher to get features, and joining these together to construct aHeteroData
object to pass downstream). Samplers are intended to expose parameters that are used for sampling (that are particular to the sampling method).The implementations of
NeighborLoader
andLinkNeighborLoader
are now very simple: they pass theNeighborSampler
and any necessary initialization parameters directly in__init__
, with no other change.