sample doesn't work correctly when nodes are repeated in input #208

Padarn · 2022-03-14T07:07:44Z

pytorch_sparse/csrc/cpu/neighbor_sample_cpu.cpp

Line 108 in 3ec1eac

cols.push_back(i);

Working on pyg-team/pytorch_geometric#4026 I discovered that this sampling function does not work properly when the input_node array has duplicates. In this case the number of samples can be > number of edges and the index i doesn't do what you'd expect it to.

Another strange behaviour is when using the function as

torch.ops.torch_sparse.neighbor_sample(
            torch.Tensor([0, 1, 2, 3]).type(torch.long),
            torch.Tensor([1, 0, 1]).type(torch.long),
            torch.Tensor([0, 2]).type(torch.long),
            [1],
            False,
            False,
        )

The returned values are

(tensor([0, 2, 1]), tensor([2, 2, 0]), tensor([0, 1, 2]), tensor([0, 2, 1]))

but the (row, col) combination (2, 0) and (0, 2) don't make sense, they're not part of the original adjacency matrix.

I'm happy to dive in a bit deeper and figure out a fix (for at least the first issue, the second issue I am not sure if its actually a problem or not yet), but thought I'd check first if this is known or maybe my understanding is incorrect.

The text was updated successfully, but these errors were encountered:

rusty1s · 2022-03-14T07:25:54Z

Can you explain a bit more on the first issue? Not yet sure I understand the issue.

I think the second issue is not an issue at all since you need to take the remapping of node indices into account. That is node 0 stays node 0, but node 1 becomes node 2 (as indicated by the first output [0, 2, 1]). As such, the edges (2, 0) correspond to the original edge (1, 0).

Padarn · 2022-03-14T09:14:37Z

ohh I see, the row/col are relative to the nodes in the output? If so, then agreed.

And if this is the case.. then actually the first issue is also a non-issue. Sorry for the false alarm!

Padarn · 2022-03-15T01:58:00Z

Here is an example of what I am seeing

import torch
from torch_geometric.data import Data
from torch_geometric.loader.neighbor_loader import NeighborLoader

edge_index = torch.tensor([[0, 1, 1],
                           [1, 0, 2]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
data = Data(x=x, edge_index=edge_index)

next(iter(NeighborLoader(data, input_nodes=torch.Tensor([0,1,2,0,0]), batch_size=5, num_neighbors=[5], directed=True)))

returns a data object with

data.x
>> tensor([[-1.],
        [ 0.],
        [ 1.],
        [-1.],
        [-1.]])

data.edge_index
>> tensor([[1, 0, 1, 1, 1],
        [0, 1, 2, 3, 4]])

I'm not convinced this is correct. I see that the edges are correct in the sense that no new connectivity is added, but this subgraph has a higher 'in degree' for node 1, thus the aggregation of messages will be different.

Thoughts?

rusty1s · 2022-03-15T15:51:44Z

Interesting. I think this is correct in a sense that there is no bug in the code. Notably, the in-degree statistics are also correct which ensures that GNN ops should work correctly as well.

It is nonetheless indeed a bit weird that we share data across duplicated input nodes (and we may want to eventually fix that), but it shouldn't block us from implementing a link-level neighbor loader.

Padarn · 2022-03-16T00:41:35Z

True, the in degree is unaffected. Though it may result in some strange behavior if anyone's model is adding self loops or making the graph undirected (not sure if anyone would), it also makes it a tad ambiguous which edge you are making a prediction for and would affect the loss somewhat. But agreed, maybe they can move forward in parallel at least. I think I can see why this is happening in torch_sparse and if you don't mind I can try to fix it.

…

On Tue, 15 Mar 2022, 11:52 pm Matthias Fey, ***@***.***> wrote: Interesting. I think this is correct in a sense that there is no bug in the code. Notably, the in-degree statistics are also correct which ensures that GNN ops should work correctly as well. It is nonetheless indeed a bit weird that we share data across duplicated input nodes (and we may want to eventually fix that), but it shouldn't block us from implementing a link-level neighbor loader. — Reply to this email directly, view it on GitHub <#208 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGRPN2LF3ABIR3TSBCJTN3VACWZVANCNFSM5QUTY7UA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

-- By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.

rusty1s · 2022-03-16T06:33:57Z

Sounds good to me. Thanks for being so careful! Currently, the neighbor_sampler does not create an isolated graph for every input node, but merges them together so that shared nodes can benefit from each other. We might need to add an option to disable this for link-level tasks.

Padarn · 2022-03-16T07:07:27Z

Good thought. I'll probably not have time until the weekend for the changes on this repo, will take a look then.

Padarn · 2022-03-19T05:38:59Z

Do you think its fair to say that sample in this library (in neighbour_sample_cpu) is actually a reverse sample? In the sense that edges are followed backwards? (this makes sense in the context of how it is used, but it is just confusing me a bit with naming)

rusty1s · 2022-03-19T15:16:51Z

Yes, that is correct. If we think about the message passing flow of a GNN, we actually start sampling from our destination nodes and sample new source nodes.

Padarn · 2022-03-19T20:39:14Z

Yep makes sense. I figured that out after a lot of head scratching but just wanted to make sure. Cheers. A

…

On Sat, 19 Mar 2022, 11:17 pm Matthias Fey, ***@***.***> wrote: Yes, that is correct. If we think about the message passing flow of a GNN, we actually start sampling from our destination nodes and sample new source nodes. — Reply to this email directly, view it on GitHub <#208 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGRPNYNS4YNQ2I5ERL6VE3VAXVW3ANCNFSM5QUTY7UA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

-- By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.

Padarn · 2022-03-20T12:58:40Z

Closing this one as we've agreed not to do anything about it in pytorch_sparse, thanks @rusty1s

Padarn mentioned this issue Mar 19, 2022

Add tests for neighbor sampling #210

Merged

1 task

Padarn closed this as completed Mar 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample doesn't work correctly when nodes are repeated in input #208

sample doesn't work correctly when nodes are repeated in input #208

Padarn commented Mar 14, 2022

rusty1s commented Mar 14, 2022

Padarn commented Mar 14, 2022 •

edited

Loading

Padarn commented Mar 15, 2022

rusty1s commented Mar 15, 2022

Padarn commented Mar 16, 2022 via email

rusty1s commented Mar 16, 2022

Padarn commented Mar 16, 2022

Padarn commented Mar 19, 2022

rusty1s commented Mar 19, 2022

Padarn commented Mar 19, 2022 via email

Padarn commented Mar 20, 2022

sample doesn't work correctly when nodes are repeated in input #208

sample doesn't work correctly when nodes are repeated in input #208

Comments

Padarn commented Mar 14, 2022

rusty1s commented Mar 14, 2022

Padarn commented Mar 14, 2022 • edited Loading

Padarn commented Mar 15, 2022

rusty1s commented Mar 15, 2022

Padarn commented Mar 16, 2022 via email

rusty1s commented Mar 16, 2022

Padarn commented Mar 16, 2022

Padarn commented Mar 19, 2022

rusty1s commented Mar 19, 2022

Padarn commented Mar 19, 2022 via email

Padarn commented Mar 20, 2022

Padarn commented Mar 14, 2022 •

edited

Loading