Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 4 #5404

Merged
merged 18 commits into from
Sep 13, 2022

Conversation

mananshah99
Copy link
Contributor

@mananshah99 mananshah99 commented Sep 9, 2022

This PR continues the effort to consolidate PyG's sampling interface in preparation for moving sample(...) behind the GraphStore interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. It builds off of #5402, and makes a significant move to abstract data loading behind a data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]] and a sampler: BaseSampler.

It does so by introducing two base implementation classes: NodeLoader and LinkLoader. NodeLoader performs sampling from nodes (using sample_from_nodes), and LinkLoader does the same from edges (using sample_from_edges). They both expose parameters in their initializers that are intended for loading (that is, the process of using a sampler to get subgraphs, using a feature fetcher to get features, and joining these together to construct a HeteroData object to pass downstream). Samplers are intended to expose parameters that are used for sampling (that are particular to the sampling method).

The implementations of NeighborLoader and LinkNeighborLoader are now very simple: they pass the NeighborSampler and any necessary initialization parameters directly in __init__, with no other change.

@mananshah99 mananshah99 self-assigned this Sep 9, 2022
@mananshah99 mananshah99 changed the base branch from master to remote_backend_4 September 9, 2022 22:57
@github-actions github-actions bot added loader and removed data labels Sep 9, 2022
@mananshah99 mananshah99 changed the title refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 5 refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 4 Sep 9, 2022
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #5404 (a49d967) into master (373c0ef) will increase coverage by 0.01%.
The diff coverage is 91.38%.

❗ Current head a49d967 differs from pull request most recent head f92f263. Consider uploading reports for the commit f92f263 to get more accurate results

@@            Coverage Diff             @@
##           master    #5404      +/-   ##
==========================================
+ Coverage   83.35%   83.36%   +0.01%     
==========================================
  Files         343      345       +2     
  Lines       18802    18821      +19     
==========================================
+ Hits        15672    15690      +18     
- Misses       3130     3131       +1     
Impacted Files Coverage Δ
torch_geometric/sampler/utils.py 97.01% <ø> (ø)
torch_geometric/loader/utils.py 83.10% <86.11%> (+1.85%) ⬆️
torch_geometric/loader/node_loader.py 91.83% <91.83%> (ø)
torch_geometric/loader/link_loader.py 94.02% <94.02%> (ø)
torch_geometric/loader/__init__.py 100.00% <100.00%> (ø)
torch_geometric/loader/link_neighbor_loader.py 100.00% <100.00%> (+9.09%) ⬆️
torch_geometric/loader/neighbor_loader.py 100.00% <100.00%> (+9.09%) ⬆️
torch_geometric/nn/models/basic_gnn.py 89.26% <100.00%> (ø)
torch_geometric/sampler/neighbor_sampler.py 92.76% <0.00%> (-0.66%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

# Initialize sampler with keyword arguments:
# NOTE sampler is an attribute of 'DataLoader':
self.node_sampler = node_sampler
if initialize_sampler:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case for not doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, see LightningNodeData and LightningLinkData for examples of this (we initialize a sampler once, and use it for multiple dataloaders).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. Lightning always gives caes I hadn't thought of. Thanks

torch_geometric/loader/node_loader.py Show resolved Hide resolved
Base automatically changed from remote_backend_4 to master September 12, 2022 18:14
Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. I feel there is some over-complication with sampler and sampler_kwargs but happy to defer to you if you feel strongly about it.

torch_geometric/loader/node_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/node_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/utils.py Show resolved Hide resolved
torch_geometric/loader/link_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/link_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/link_neighbor_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/neighbor_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/neighbor_loader.py Outdated Show resolved Hide resolved
torch_geometric/loader/neighbor_loader.py Outdated Show resolved Hide resolved
torch_geometric/sampler/base.py Outdated Show resolved Hide resolved
@mananshah99 mananshah99 merged commit 69f85c2 into master Sep 13, 2022
@mananshah99 mananshah99 deleted the remote_backend_5 branch September 13, 2022 06:23
mananshah99 added a commit that referenced this pull request Sep 13, 2022
…5418)

This PR builds off of #5404 by refactoring `HGTLoader` behind the sampler and loader interface; the simplicity of this refactor also shows the flexibility of the interface. In doing so, it defines `HGTSampler` that inherits from `BaseSampler`, and uses `HGTSampler` as part of the `HGTLoader(NodeLoader)` construction. With this setup, it should also be trivial to add an HGT link-level loader, but this task is left as a TODO.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants