Link-level `NeighborLoader` #4026

rusty1s · 2022-02-08T09:42:23Z

🚀 The feature, motivation and pitch

Currently, NeighborLoader is designed to be applied in node-level tasks and there exists no option for mini-batching in link-level tasks.

To achieve this, users currently rely on a simple but hacky workaround, first utilized in ogbl-citation2 in this example.

The idea is straightforward and simple: For input_nodes, we pass in both the source and destination nodes for every link we want to do link prediction on (both positive and negative):

loader = NeighborLoader(data, input_nodes=edge_label_index.view(-1), ...)

Nonetheless, PyG should provide a dedicated class to perform mini-batch on link-level tasks, re-using functionality from NeighborLoader under-the-hood. An API could look like:

class LinkLevelNeighborLoader(
    data,
     input_edges=...
     input_edge_labels=...
     with_negative_sampling=True,
     **kwargs,
)

NOTE: This workaround currently only works for homogenous graphs!

@RexYing @JiaxuanYou

The text was updated successfully, but these errors were encountered:

Jeriousman · 2022-03-04T02:09:17Z

How is the progress? on the LinkLevelNeighborLoader..?

rusty1s · 2022-03-04T21:19:16Z

We will post here once we make progress. Sorry for the delay.

Padarn · 2022-03-08T11:03:36Z

Hey @rusty1s want a hand with this one?

rusty1s · 2022-03-08T11:18:08Z

Help is always good, thank you! Let me know how we want to proceed with this. @RexYing might have further thoughts.

Padarn · 2022-03-08T14:48:33Z

If I'm picking it up I would plan to start with the proposed API in the top of this issue and see how it would look for regular and heterogenous graphs. I need to play around a bit to understand the requirements. With a working example (even if a bit of a hack) we can align on the rest of the implementation details?

What do you think?

rusty1s · 2022-03-09T07:59:59Z

We could follow along with that in a similar fashion as in your label masked prop PR, and discuss as we go :)

Padarn · 2022-03-09T10:05:42Z

Sounds good to me :-)

…

On Wed, Mar 9, 2022 at 4:00 PM Matthias Fey ***@***.***> wrote: We could follow along with that in a similar fashion as in your label masked prop PR, and discuss as we go :) — Reply to this email directly, view it on GitHub <#4026 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGRPN5J5G4F2366SWR34NLU7BLAZANCNFSM5NZ5CGPA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

-- By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.

Padarn · 2022-03-13T03:14:45Z

Hey @rusty1s I'm trying to understand something in the existing NeighbourLoader for hetro graphs: It looks to me that it is set up to only work with a single type of node being sampled. Is this intentional? I couldn't find a clear description of this in the docstring.

As for this change, after reading the code, reading the issue and playing around here is my rough plan (in order of steps):

1. build a LinkNeighborLoader which works for homogenous graphs and doesn't have negative sampling, but wraps the hack above
2. introduce negative sampling
3. adapt to work for heterogeneous graphs
4. create example of use
5. any cleaning/optimisation or functional enhancements we need for it to be useful as a first version

There are a couple of random questions in my mind which I may not worry about too much right now but feel free to comment on:

Do we want to make sure sampling from both ends of a link always happen? Or are there case where one might only want to follow the direction of a link.
Should 'num neighbours' apply to the link as a whole, or each node attached as a link?

What do you think?

Padarn · 2022-03-13T03:32:44Z

Sorry one more question: The hack in PositiveLinkNeighborSampler am I right in thinking that we don't do deduplication before sampling?

rusty1s · 2022-03-13T08:44:22Z

It looks to me that it is set up to only work with a single type of node being sampled. Is this intentional?

Yes, this is intentional. There rarely exists use-cases where we perform node classification across different node types. The reason we currently restrict it is more due to implementation details though, as we somehow need to map the indices produced by the underlying PyTorch DataLoader to the respective node types once again. The underlying C++/CUDA sampling procedure can handle multiple node types though (it expects a dictionary of node indices for a subset of node types).

I think your roadmap is super useful. Thanks a lot for setting this up. Regarding your questions:

Do we want to make sure sampling from both ends of a link always happen? Or are there case where one might only want to follow the direction of a link.

I think the user still needs to specify the links to compute embeddings for. That's what I originally meant with the input_edges/input_links argument. If it is not set, the sampler will iterate over all edges present in the data. Alternatively, we make it a required argument.

Should 'num neighbours' apply to the link as a whole, or each node attached as a link?

Yes, I think so. In the end, we simply sample num_neighbors from both source and destination node for each edge.

The hack in PositiveLinkNeighborSampler am I right in thinking that we don't do deduplication before sampling?

Can you explain what you mean?

Padarn · 2022-03-13T09:23:41Z

Can you explain what you mean?

My understanding from https://github.com/snap-stanford/ogb/blob/master/examples/linkproppred/citation2/sampler.py#L17-L41

batch = torch.cat([row[edge_idx], col[edge_idx]], dim=0)

Is that we take the nodes from start and end of each edge in the batch and then do neighbourhood expansion. But there may be duplicate nodes in the result of this cat.

rusty1s · 2022-03-13T09:28:25Z

I agree. I think the implementation is easier if we do not merge duplicated nodes, and the gains in efficiency may by negligible. This also aligns with the intuition that each example in a batch is isolated from each other.

Padarn · 2022-03-13T09:31:53Z

Okay agreed. Thanks for the thoughts!

rusty1s · 2022-04-08T19:20:12Z

A first prototype was integrated via #4396, see loader.LinkNeighborLoader (thanks to @Padarn). Any feedback is highly appreciated. It supports homogeneous and heterogeneous link prediction tasks. A current limitation is that it does not support internal negative sampling yet. We are working on it.

shishixuezi · 2022-05-11T03:28:27Z

Hello, thanks for adding this feature!

I have a small question. If I want to use the result of RandomLinkSplit to generate batch by LinkNeighborLoader, it will produce an IndexError.

I think it may be due to the generated edge_label_index attribute by RandomLinkSplit. The shape of edge_label_index is [2, num_edges]. But in the LinkNeighborLoader, if the key of split attribute is an edge attribute, it will only select index from dimension zero, which will cause an IndexError.

Do you have some suggestions for this case? Thank you very much!

Padarn · 2022-05-11T03:34:02Z

Hi @shishixuezi thanks for the report - could you provide a short example of what you're doing exactly and I can take a look to see.

shishixuezi · 2022-05-11T04:40:52Z

Hello, @Padarn Thank you for your reply. I created a toy case, please check. Thank you very much!

import torch
from torch_geometric.data import Data
from torch_geometric.loader import LinkNeighborLoader
import torch_geometric.transforms as T


def main():
    edge_index = torch.tensor([[0, 1, 1, 2, 0, 1, 2],
                               [1, 0, 2, 1, 3, 3, 3]], dtype=torch.long)
    x = torch.tensor([[-1], [0], [1], [4]], dtype=torch.float)
    edge_attr = torch.tensor([[1.0], [2.0], [1.0], [1.0], [1.0], [1.0], [1.0]], dtype=torch.float)
    data = Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

    transform = T.Compose([
        T.NormalizeFeatures(),
        T.ToDevice('cuda' if torch.cuda.is_available() else 'cpu'),
        T.RandomLinkSplit(num_val=0.1, num_test=0.05, is_undirected=False,
                          add_negative_train_samples=False, neg_sampling_ratio=0.0,
                          key='edge_attr')])

    train_data, val_data, test_data = transform(data)

    # No Problem
    # loader = LinkNeighborLoader(data, num_neighbors=[2]*2)

    # Cause Error
    loader = LinkNeighborLoader(train_data, num_neighbors=[2]*2)

    print(next(iter(loader)))


if __name__ == '__main__':
    main()

Padarn · 2022-05-11T08:04:00Z

Thanks for the example, I'll take a look as soon as I get a chance.

…

On Wed, 11 May 2022, 12:41 pm ssxz, ***@***.***> wrote: Hello, @Padarn <https://github.com/Padarn>! Thank you for your reply! I created a toy case, please check! Thank you very much! ` import torch from torch_geometric.data import Data from torch_geometric.loader import LinkNeighborLoader import torch_geometric.transforms as T def main(): edge_index = torch.tensor([[0, 1, 1, 2, 0, 1, 2], [1, 0, 2, 1, 3, 3, 3]], dtype=torch.long) x = torch.tensor([[-1], [0], [1], [4]], dtype=torch.float) edge_attr = torch.tensor([[1.0], [2.0], [1.0], [1.0], [1.0], [1.0], [1.0]], dtype=torch.float) data = Data(x=x, edge_index=edge_index, edge_attr=edge_attr) transform = T.Compose([ T.NormalizeFeatures(), T.ToDevice('cuda' if torch.cuda.is_available() else 'cpu'), T.RandomLinkSplit(num_val=0.1, num_test=0.05, is_undirected=False, add_negative_train_samples=False, neg_sampling_ratio=0.0, key='edge_attr')]) train_data, val_data, test_data = transform(data) # No Problem # loader = LinkNeighborLoader(data, num_neighbors=[2]*2 # Cause Error loader = LinkNeighborLoader(train_data, num_neighbors=[2]*2) print(next(iter(loader))) if *name* == '*main*': main() ` — Reply to this email directly, view it on GitHub <#4026 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGRPN4HUHRHJHA52D3K4BDVJM257ANCNFSM5NZ5CGPA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.

Padarn · 2022-05-12T13:32:33Z

So I see the problem, but I can't yet think of a clean fix. A workaround you could use for now:

train_data.edge_attr_index = train_data.edge_attr_index.t()
loader = LinkNeighborLoader(train_data, num_neighbors=[2]*2)

I'll raise a MR with a potential fix.

rusty1s · 2022-05-12T19:47:03Z

Fixed via #4629.

shishixuezi · 2022-05-13T01:42:45Z

Wow, so cool! Thank you very much! @Padarn @rusty1s

kamibrumi · 2022-07-18T14:49:18Z

Hi, I'm trying to import the LinkNeighborLoader in JupyterLab but I'm getting this error:
`---> 35 from torch_geometric.loader import LinkNeighborLoader

ImportError: cannot import name 'LinkNeighborLoader' from 'torch_geometric.loader' (/Users/cbrumar/.local/share/virtualenvs/gnn-TG0lFQrB/lib/python3.9/site-packages/torch_geometric/loader/init.py)

I'm using PyTorch Geometric version 2.0.4 and I installed it using pip.

Edit: I am using PyTorch version 1.11.0.

rusty1s · 2022-07-18T19:28:06Z

You need to install PyG master or from nightly.

kamibrumi · 2022-07-22T15:15:18Z

Thank you, @rusty1s!

YoavLotem · 2022-07-25T11:46:53Z

Hey!
First of all thanks for this amazing tool I appreciate your work.

I'm currently working on an edge classification problem in an environment in which I can't use the LinkNeighborLoader due to irrelevant constraints.

I didn't fully understand the previous work-around:
Should I use loader = NeighborLoader(data, input_nodes=edge_label_index.view(-1), ...) ?
Or should I use the PositiveLinkNeighborSampler from this example.

Thanks in advance.

rusty1s · 2022-07-25T20:18:16Z

If you cannot use the new LinkNeighborLoader interface, it is recommended to follow the OGB example

YoavLotem · 2022-07-27T09:06:20Z

Thanks for your response.

how can I use the 'PositiveLinkNeighborSampler' in the OGB example to sample a subgraph for specific edges, similarly to 'edge_label_index' in 'LinkNeighborLoader'?

rusty1s · 2022-07-27T11:51:42Z

You can save it in the constructor, initialize edge_idx as torch.arange(edge_label_index.size(1)), and then access subsets of it inside sample.

francyya · 2022-12-16T21:29:11Z

@rusty1s When I apply the LinkNeighborLoader on training data with negative samples, it turned out target labels 0,1,2. I'm trying to understand how target label=2 shows up given that there is only two class label for link prediction task? Thanks.

rusty1s · 2022-12-17T11:43:29Z

When using LinkNeighborLoader with negative samples, we will automatically add zero labels for these negative edges, and adjust the initial edge_label by incrementing it by one.

brovatten · 2023-08-12T21:41:10Z

Hi,

While using disjoint=True in torch-sparse it recommends me to use pyg-lib. But when I activate pyg-lib through typing by

torch_geometric.typing.WITH_PYG_LIB = True
torch_geometric.typing.WITH_TORCH_SPARSE = False

it throws me this error:
AttributeError: '_OpNamespace' 'pyg' object has no attribute 'neighbor_sample'

reference code:

from torch_geometric.loader import LinkNeighborLoader
LinkNeighborLoader(data=data, num_neighbors=[10], batch_size=1, disjoint=True, shuffle=False)

torch 2.0.1. What could be the issue?

rusty1s · 2023-08-14T07:22:44Z

What does

import torch
import pyg_lib
print(pyg_lib.__version__)
print(torch.ops.pyg.neighbor_sample)

return?

rusty1s added bug feature labels Feb 8, 2022

rusty1s self-assigned this Feb 8, 2022

rusty1s added 0 - Priority P0 and removed bug labels Feb 8, 2022

rusty1s assigned RexYing Feb 8, 2022

rusty1s mentioned this issue Feb 8, 2022

Link Prediction on Heterogeneous Graphs with Heterogeneous Graph Learning #3958

Closed

Padarn mentioned this issue Mar 14, 2022

sample doesn't work correctly when nodes are repeated in input rusty1s/pytorch_sparse#208

Closed

Padarn mentioned this issue Apr 2, 2022

Link level NeighborLoader #4396

Merged

4 tasks

rusty1s linked a pull request Apr 8, 2022 that will close this issue

Link level NeighborLoader #4396

Merged

4 tasks

rusty1s closed this as completed in #4396 Apr 8, 2022

Padarn mentioned this issue Apr 10, 2022

Add random negative sampling to LinkNeighborLoader #4446

Merged

Padarn mentioned this issue May 12, 2022

Fix dimension in edge filter selection #4629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link-level `NeighborLoader` #4026

Link-level `NeighborLoader` #4026

rusty1s commented Feb 8, 2022 •

edited

Loading

Jeriousman commented Mar 4, 2022

rusty1s commented Mar 4, 2022

Padarn commented Mar 8, 2022

rusty1s commented Mar 8, 2022

Padarn commented Mar 8, 2022

rusty1s commented Mar 9, 2022

Padarn commented Mar 9, 2022 via email

Padarn commented Mar 13, 2022

Padarn commented Mar 13, 2022

rusty1s commented Mar 13, 2022

Padarn commented Mar 13, 2022 •

edited

Loading

rusty1s commented Mar 13, 2022

Padarn commented Mar 13, 2022

rusty1s commented Apr 8, 2022 •

edited

Loading

shishixuezi commented May 11, 2022

Padarn commented May 11, 2022

shishixuezi commented May 11, 2022 •

edited

Loading

Padarn commented May 11, 2022 via email

Padarn commented May 12, 2022

rusty1s commented May 12, 2022

shishixuezi commented May 13, 2022

kamibrumi commented Jul 18, 2022 •

edited

Loading

rusty1s commented Jul 18, 2022

kamibrumi commented Jul 22, 2022

YoavLotem commented Jul 25, 2022 •

edited

Loading

rusty1s commented Jul 25, 2022

YoavLotem commented Jul 27, 2022 •

edited

Loading

rusty1s commented Jul 27, 2022

francyya commented Dec 16, 2022

rusty1s commented Dec 17, 2022

brovatten commented Aug 12, 2023

rusty1s commented Aug 14, 2023

Link-level NeighborLoader #4026

Link-level NeighborLoader #4026

Comments

rusty1s commented Feb 8, 2022 • edited Loading

🚀 The feature, motivation and pitch

Jeriousman commented Mar 4, 2022

rusty1s commented Mar 4, 2022

Padarn commented Mar 8, 2022

rusty1s commented Mar 8, 2022

Padarn commented Mar 8, 2022

rusty1s commented Mar 9, 2022

Padarn commented Mar 9, 2022 via email

Padarn commented Mar 13, 2022

Padarn commented Mar 13, 2022

rusty1s commented Mar 13, 2022

Padarn commented Mar 13, 2022 • edited Loading

rusty1s commented Mar 13, 2022

Padarn commented Mar 13, 2022

rusty1s commented Apr 8, 2022 • edited Loading

shishixuezi commented May 11, 2022

Padarn commented May 11, 2022

shishixuezi commented May 11, 2022 • edited Loading

Padarn commented May 11, 2022 via email

Padarn commented May 12, 2022

rusty1s commented May 12, 2022

shishixuezi commented May 13, 2022

kamibrumi commented Jul 18, 2022 • edited Loading

rusty1s commented Jul 18, 2022

kamibrumi commented Jul 22, 2022

YoavLotem commented Jul 25, 2022 • edited Loading

rusty1s commented Jul 25, 2022

YoavLotem commented Jul 27, 2022 • edited Loading

rusty1s commented Jul 27, 2022

francyya commented Dec 16, 2022

rusty1s commented Dec 17, 2022

brovatten commented Aug 12, 2023

rusty1s commented Aug 14, 2023

Link-level `NeighborLoader` #4026

Link-level `NeighborLoader` #4026

rusty1s commented Feb 8, 2022 •

edited

Loading

Padarn commented Mar 13, 2022 •

edited

Loading

rusty1s commented Apr 8, 2022 •

edited

Loading

shishixuezi commented May 11, 2022 •

edited

Loading

kamibrumi commented Jul 18, 2022 •

edited

Loading

YoavLotem commented Jul 25, 2022 •

edited

Loading

YoavLotem commented Jul 27, 2022 •

edited

Loading