Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triage: DiscV5 does not work across multiple nodes in k8s #9736

Closed
Tracked by #9712
just-mitch opened this issue Nov 4, 2024 · 2 comments
Closed
Tracked by #9712

triage: DiscV5 does not work across multiple nodes in k8s #9736

just-mitch opened this issue Nov 4, 2024 · 2 comments
Assignees
Labels
C-p2p Component: peer to peer

Comments

@just-mitch
Copy link
Collaborator

just-mitch commented Nov 4, 2024

I think this would be fixed if each validator had an IP that was publicly accessible, which we used for p2p traffic.

This was manifesting because validators were not getting all attestations in time, likely because we were overly relying on the boot-node for gossiping, because this was broken.

This also explains why the 48 validator setup worked on mainframe at the London offsite.

Repro

Try running a 3 validator setup in AWS with "discv5:*" enabled in the debug string.

You will find things like the following:

2024-11-04T22:43:44.768Z discv5:service Lookup Id: 1 finished, 0 total found
2024-11-04T22:43:44.772Z discv5:sessionService Request timed out with { socketAddr: Multiaddr(/ip4/10.1.2.225/udp/40400), nodeId: '5e8fbc8ff6188366b6ffe2013596780961109cbf9c28ee3902a3d30952016cf6' }
2024-11-04T22:43:44.772Z discv5:service RPC error, removing request. Reason: Timeout, id 5943808880661031700n
2024-11-04T22:43:44.773Z discv5:service Failed FINDNODE request: { type: 3, id: 5943808880661031700n, distances: [ 255, 256, 254 ] } for node: 5e8fbc8ff6188366b6ffe2013596780961109cbf9c28ee3902a3d30952016cf6

Increasing the timeouts to 5 seconds did not help.

@just-mitch just-mitch added T-bug Type: Bug. Something is broken. C-p2p Component: peer to peer labels Nov 4, 2024
@just-mitch just-mitch added this to the Sequencer & Prover Testnet milestone Nov 4, 2024
@just-mitch just-mitch changed the title DiscV5 does not work across multiple nodes in k8s triage: DiscV5 does not work across multiple nodes in k8s Nov 7, 2024
@just-mitch just-mitch self-assigned this Nov 7, 2024
@just-mitch just-mitch added hotfix A PR/issue that needs to be cherrypicked back to the RC and removed T-bug Type: Bug. Something is broken. labels Nov 7, 2024
@just-mitch
Copy link
Collaborator Author

Can confirm that basic UDP messaging is not working across nodes in AWS.

@just-mitch
Copy link
Collaborator Author

Our security group in AWS did not allow inbound UDP traffic.

I manually added the ports 40400-40500 (nit: I think this should have been 40400-40499).

Can confirm I'm seeing cross node peering now:

2024-11-07T23:57:26.813Z discv5:service Starting a new lookup. Id: 35296
2024-11-07T23:57:26.813Z discv5:service Sending FINDNODE to node: { socketAddr: Multiaddr(/ip4/10.1.1.22/udp/40400), nodeId: '0faed959a8e524a8709508f05b15f586d54dae482cff6233e022a9f3ff70ed9f' }
2024-11-07T23:57:26.813Z discv5:service Sending FINDNODE to node: { socketAddr: Multiaddr(/ip4/10.1.2.130/udp/40400), nodeId: '20383ef0999c80417942da3be1f60eec4eb0866743b24f0bcf1b5fca4c9dee6d' }
2024-11-07T23:57:26.814Z discv5:service Sending FINDNODE to node: { socketAddr: Multiaddr(/ip4/10.1.2.232/udp/40400), nodeId: 'c667dcdaaaf7ca8f1f2be6dd805944ff36912e1d0e0b9a007b80a64b60593ad0' }
2024-11-07T23:57:26.815Z discv5:sessionService Received message from: { socketAddr: Multiaddr(/ip4/10.1.1.22/udp/40400), nodeId: '0faed959a8e524a8709508f05b15f586d54dae482cff6233e022a9f3ff70ed9f' }
2024-11-07T23:57:26.816Z discv5:service Received NODES response of length: 3, total: 1, from node: { socketAddr: Multiaddr(/ip4/10.1.1.22/udp/40400), nodeId: '0faed959a8e524a8709508f05b15f586d54dae482cff6233e022a9f3ff70ed9f' }
2024-11-07T23:57:26.817Z discv5:service 2 peers found for lookup Id: 35296, Node: 0faed959a8e524a8709508f05b15f586d54dae482cff6233e022a9f3ff70ed9f

Will create a followup to add the UDP rules to our terraform.

@just-mitch just-mitch removed the hotfix A PR/issue that needs to be cherrypicked back to the RC label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-p2p Component: peer to peer
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant