refactor(loader): `NodeLoader` and `LinkLoader` as base implementation classes, part 4 #5404

mananshah99 · 2022-09-09T22:57:40Z

This PR continues the effort to consolidate PyG's sampling interface in preparation for moving sample(...) behind the GraphStore interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. It builds off of #5402, and makes a significant move to abstract data loading behind a data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]] and a sampler: BaseSampler.

It does so by introducing two base implementation classes: NodeLoader and LinkLoader. NodeLoader performs sampling from nodes (using sample_from_nodes), and LinkLoader does the same from edges (using sample_from_edges). They both expose parameters in their initializers that are intended for loading (that is, the process of using a sampler to get subgraphs, using a feature fetcher to get features, and joining these together to construct a HeteroData object to pass downstream). Samplers are intended to expose parameters that are used for sampling (that are particular to the sampling method).

The implementations of NeighborLoader and LinkNeighborLoader are now very simple: they pass the NeighborSampler and any necessary initialization parameters directly in __init__, with no other change.

…emote_backend_4

…ric into remote_backend_4

for more information, see https://pre-commit.ci

codecov · 2022-09-09T23:02:21Z

Codecov Report

Merging #5404 (a49d967) into master (373c0ef) will increase coverage by 0.01%.
The diff coverage is 91.38%.

❗ Current head a49d967 differs from pull request most recent head f92f263. Consider uploading reports for the commit f92f263 to get more accurate results

@@            Coverage Diff             @@
##           master    #5404      +/-   ##
==========================================
+ Coverage   83.35%   83.36%   +0.01%     
==========================================
  Files         343      345       +2     
  Lines       18802    18821      +19     
==========================================
+ Hits        15672    15690      +18     
- Misses       3130     3131       +1

Impacted Files	Coverage Δ
torch_geometric/sampler/utils.py	`97.01% <ø> (ø)`
torch_geometric/loader/utils.py	`83.10% <86.11%> (+1.85%)`	⬆️
torch_geometric/loader/node_loader.py	`91.83% <91.83%> (ø)`
torch_geometric/loader/link_loader.py	`94.02% <94.02%> (ø)`
torch_geometric/loader/__init__.py	`100.00% <100.00%> (ø)`
torch_geometric/loader/link_neighbor_loader.py	`100.00% <100.00%> (+9.09%)`	⬆️
torch_geometric/loader/neighbor_loader.py	`100.00% <100.00%> (+9.09%)`	⬆️
torch_geometric/nn/models/basic_gnn.py	`89.26% <100.00%> (ø)`
torch_geometric/sampler/neighbor_sampler.py	`92.76% <0.00%> (-0.66%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Padarn · 2022-09-10T02:17:06Z

torch_geometric/loader/node_loader.py

+        # Initialize sampler with keyword arguments:
+        # NOTE sampler is an attribute of 'DataLoader':
+        self.node_sampler = node_sampler
+        if initialize_sampler:


Is there a use case for not doing this?

Yeah, see LightningNodeData and LightningLinkData for examples of this (we initialize a sampler once, and use it for multiple dataloaders).

Ahh I see. Lightning always gives caes I hadn't thought of. Thanks

torch_geometric/loader/node_loader.py

torch_geometric/loader/link_loader.py

rusty1s

Looks mostly good to me. I feel there is some over-complication with sampler and sampler_kwargs but happy to defer to you if you feel strongly about it.

torch_geometric/loader/node_loader.py

torch_geometric/loader/utils.py

torch_geometric/loader/link_loader.py

torch_geometric/loader/link_neighbor_loader.py

torch_geometric/loader/neighbor_loader.py

torch_geometric/sampler/base.py

…emote_backend_5

…5418) This PR builds off of #5404 by refactoring `HGTLoader` behind the sampler and loader interface; the simplicity of this refactor also shows the flexibility of the interface. In doing so, it defines `HGTSampler` that inherits from `BaseSampler`, and uses `HGTSampler` as part of the `HGTLoader(NodeLoader)` construction. With this setup, it should also be trivial to add an HGT link-level loader, but this task is left as a TODO.

mananshah99 added 8 commits September 9, 2022 01:58

init

7cb0d86

Merge branch 'master' of github.com:pyg-team/pytorch_geometric into r…

c940682

…emote_backend_4

update

300f18a

Merge branch 'master' into remote_backend_4

898fbc2

changelog

36f53db

Merge branch 'remote_backend_4' of github.com:pyg-team/pytorch_geomet…

8af333f

…ric into remote_backend_4

update

8e6b3cb

init

55187f2

mananshah99 added 0 - Priority P0 data labels Sep 9, 2022

mananshah99 self-assigned this Sep 9, 2022

mananshah99 changed the base branch from master to remote_backend_4 September 9, 2022 22:57

github-actions bot added loader and removed data labels Sep 9, 2022

mananshah99 and others added 2 commits September 9, 2022 22:58

duplicate code

8495b41

[pre-commit.ci] auto fixes from pre-commit.com hooks

c7a58e7

for more information, see https://pre-commit.ci

mananshah99 changed the title ~~refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 5~~ refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 4 Sep 9, 2022

Padarn reviewed Sep 10, 2022

View reviewed changes

rusty1s added the refactor label Sep 11, 2022

Base automatically changed from remote_backend_4 to master September 12, 2022 18:14

mananshah99 added 3 commits September 12, 2022 18:19

merge

2641289

update

f56dbb2

update

666483f

mananshah99 mentioned this pull request Sep 12, 2022

refactor(loader): HGTLoader behind NodeLoader interface, part 5 #5418

Merged

mananshah99 requested review from rusty1s, wsad1 and a team September 12, 2022 23:08

yaoyaowd reviewed Sep 12, 2022

View reviewed changes

torch_geometric/loader/link_loader.py Show resolved Hide resolved

rusty1s approved these changes Sep 13, 2022

View reviewed changes

mananshah99 added 3 commits September 13, 2022 05:53

comments

699723f

remove num_neighbors attr

355f88c

Merge branch 'master' of github.com:pyg-team/pytorch_geometric into r…

bcc6341

…emote_backend_5

github-actions bot added benchmark nn labels Sep 13, 2022

mananshah99 added 2 commits September 13, 2022 06:07

flake8

a49d967

raw strings

f92f263

mananshah99 merged commit 69f85c2 into master Sep 13, 2022

mananshah99 deleted the remote_backend_5 branch September 13, 2022 06:23

mananshah99 mentioned this pull request Sep 20, 2022

[Roadmap] Remote Backend Support and Integration 🚀 #4806

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(loader): `NodeLoader` and `LinkLoader` as base implementation classes, part 4 #5404

refactor(loader): `NodeLoader` and `LinkLoader` as base implementation classes, part 4 #5404

mananshah99 commented Sep 9, 2022 •

edited

Loading

codecov bot commented Sep 9, 2022 •

edited

Loading

Padarn Sep 10, 2022

mananshah99 Sep 12, 2022

Padarn Sep 13, 2022

rusty1s left a comment

refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 4 #5404

refactor(loader): NodeLoader and LinkLoader as base implementation classes, part 4 #5404

Conversation

mananshah99 commented Sep 9, 2022 • edited Loading

codecov bot commented Sep 9, 2022 • edited Loading

Codecov Report

Padarn Sep 10, 2022

Choose a reason for hiding this comment

mananshah99 Sep 12, 2022

Choose a reason for hiding this comment

Padarn Sep 13, 2022

Choose a reason for hiding this comment

rusty1s left a comment

Choose a reason for hiding this comment

refactor(loader): `NodeLoader` and `LinkLoader` as base implementation classes, part 4 #5404

refactor(loader): `NodeLoader` and `LinkLoader` as base implementation classes, part 4 #5404

mananshah99 commented Sep 9, 2022 •

edited

Loading

codecov bot commented Sep 9, 2022 •

edited

Loading