Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mdns/fix: Failed to register opened substream #301

Merged
merged 6 commits into from
Dec 12, 2024
Merged

mdns/fix: Failed to register opened substream #301

merged 6 commits into from
Dec 12, 2024

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented Dec 11, 2024

This PR ensures that when MDNS encounters an error it does not terminate other litep2p components.

Previously, if MDNS failed to send a query or to handle the incoming packets it would exit.
The exit is presented by the following log line observed on kusama validator:

tokio-runtime-worker litep2p::mdns: failed to send mdns query error=IoError(NetworkUnreachable)

This situation is causing the substrate Discovery mechanism to also exit, which propagates to the litep2p kademlia handler that exits as well. This leaves the node unable to discover the network or handle incoming substreams.

Testing Done

The issue was reproduced locally with a tokio interval patch that exits the MDNS component after having connectivity in Kusama:

2024-12-11 12:50:34.425 ERROR tokio-runtime-worker litep2p::mdns: interval tick MDNS
2024-12-11 12:50:34.425 ERROR tokio-runtime-worker litep2p::mdns: interval tick expired, closing MDNS

2024-12-11 12:50:35.111 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWEwh9AwKFUJKPFqmJXWByH7JKYRcfAUfPvp9f3xzj3ibJ") endpoint=Dialer { address: "/ip4/3.96.91.180/tcp/30333", connection_id: ConnectionId(200) } error=ConnectionClosed
...
2024-12-11 12:50:38.753 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWJb1W7jmqDCaU3Hsh6NRfDo12gnj8hnKfGwA77vRE4jBv") endpoint=Dialer { address: "/ip4/51.38.63.126/tcp/30333", connection_id: ConnectionId(294) } error=ConnectionClosed
2024-12-11 12:50:40.389 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWGXXuap75AN24aA5XP9S1X3BKqdDbYyHwBTJakMyv1P5V") endpoint=Dialer { address: "/ip4/104.243.41.217/tcp/30330", connection_id: ConnectionId(29) } error=ConnectionClosed
...

2024-12-11 12:53:15.690 ERROR tokio-runtime-worker litep2p::tcp: connection exited with error connection_id=ConnectionId(29) error=EssentialTaskClosed
2024-12-11 12:53:40.071 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWGphqiEqsfR5ZnV7R2Lgubxi7eAo6MTx3tVmso8oCkvJn") endpoint=Dialer { address: "/ip4/51.163.1.153/tcp/30003", connection_id: ConnectionId(51) } error=ConnectionClosed
2024-12-11 12:53:40.233 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWM5mnupyiDGtdN6qm3riQDjBbAZfFqAJfMbcbPQbkEn8u") endpoint=Dialer { address: "/ip4/168.119.149.170/tcp/30333", connection_id: ConnectionId(28) } error=ConnectionClosed
2024-12-11 12:53:41.060 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWGphqiEqsfR5ZnV7R2Lgubxi7eAo6MTx3tVmso8oCkvJn") endpoint=Dialer { address: "/ip4/51.163.1.153/tcp/30003", connection_id: ConnectionId(51) } error=ConnectionClosed
2024-12-11 12:53:42.766 ERROR tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWM5mnupyiDGtdN6qm3riQDjBbAZfFqAJfMbcbPQbkEn8u") endpoint=Dialer { address: "/ip4/168.119.149.170/tcp/30333", connection_id: ConnectionId(28) } error=ConnectionClosed

Closes: #300

Thanks @dmitry-markin for also confirming this 🙏

cc @paritytech/networking

@lexnv lexnv added the bug Something isn't working label Dec 11, 2024
@lexnv lexnv self-assigned this Dec 11, 2024
// Before starting the loop, make an initial query to the network
if let Err(error) = self.on_outbound_request().await {
tracing::error!(target: LOG_TARGET, ?error, "Failed to send initial mdns query. MDNS entering failure mode");
futures::future::pending::<()>().await;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically execution would be stuck here forever, right? Is it a problem that self.socket will never be polled then below?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would tokio::time::Interval work here? It fires immediately the first time, so there will be no need to send the first outbound request manually.

The caveat is to use Delay MissedTickBehavior, otherwise we might end up bursting many packets if for some reason other branches of select! take too long.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep indeed, have ended up using tokio::time::Interval here 🙏

@lexnv lexnv merged commit ef495b8 into master Dec 12, 2024
8 checks passed
@lexnv lexnv deleted the lexnv/fix-mdns branch December 12, 2024 13:41
lexnv added a commit that referenced this pull request Dec 12, 2024
## [0.8.4] - 2024-12-12

This release aims to make the MDNS component more robust by fixing a bug
that caused the MDNS service to fail to register opened substreams.
Additionally, the release includes several improvements to the
`identify` protocol, replacing `FuturesUnordered` with `FuturesStream`
for better performance.

### Fixed

- mdns/fix: Failed to register opened substream
([#301](#301))

### Changed

- identify: Replace FuturesUnordered with FuturesStream
([#302](#302))
- chore: Update hickory-resolver to version 0.24.2
([#304](#304))
- ci: Ensure cargo-machete is working with rust version from CI
([#303](#303))


cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Dec 12, 2024
## [0.8.4] - 2024-12-12

This release aims to make the MDNS component more robust by fixing a bug
that caused the MDNS service to fail to register opened substreams.
Additionally, the release includes several improvements to the
`identify` protocol, replacing `FuturesUnordered` with `FuturesStream`
for better performance.

### Fixed

- mdns/fix: Failed to register opened substream
([#301](paritytech/litep2p#301))

### Changed

- identify: Replace FuturesUnordered with FuturesStream
([#302](paritytech/litep2p#302))
- chore: Update hickory-resolver to version 0.24.2
([#304](paritytech/litep2p#304))
- ci: Ensure cargo-machete is working with rust version from CI
([#303](paritytech/litep2p#303))


cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added a commit to paritytech/polkadot-sdk that referenced this pull request Dec 12, 2024
## [0.8.4] - 2024-12-12

This release aims to make the MDNS component more robust by fixing a bug
that caused the MDNS service to fail to register opened substreams.
Additionally, the release includes several improvements to the
`identify` protocol, replacing `FuturesUnordered` with `FuturesStream`
for better performance.

### Fixed

- mdns/fix: Failed to register opened substream
([#301](paritytech/litep2p#301))

### Changed

- identify: Replace FuturesUnordered with FuturesStream
([#302](paritytech/litep2p#302))
- chore: Update hickory-resolver to version 0.24.2
([#304](paritytech/litep2p#304))
- ci: Ensure cargo-machete is working with rust version from CI
([#303](paritytech/litep2p#303))


cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kad: Failed to register opened substream to protocol
3 participants