Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finer-grained multistream-select upgrade protocol version selection #5074

Closed
alindima opened this issue Jan 9, 2024 · 5 comments
Closed

Comments

@alindima
Copy link

alindima commented Jan 9, 2024

Description

Allow NetworkBehaviours to customise the upgrade protocol negotiation version that is to be used. Even better, allow for the request-response Behaviour to set this version value for each outgoing request type.

As an alternative, have a failing OutboundUpgrade not kill the entire connection. Considering that a connection is multiplexed for various protocols, it's not obvious to me why we do this.

Motivation

This is needed because V1Lazy will end up killing the entire connection if the protocol is not supported. We would instead want to make a higher-level decision of retrying the request on a different protocol. See paritytech/polkadot-sdk#2771.

Ideally, we shouldn't have to set this as an application-wide setting, because V1Lazy is more performant and we want to use it as the default version. We want to use V1 only for select request types.

Current Implementation

The only current option is to set this value when configuring the swarm, via SwarmBuilder::substream_upgrade_protocol_override, as an application-wide configuration.

Are you planning to do it yourself in a pull request ?

Maybe

@alindima
Copy link
Author

alindima commented Jan 9, 2024

As an alternative, have a failing OutboundUpgrade not kill the entire connection. Considering that a connection is multiplexed for various protocols, it's not obvious to me why we do this.

Another angle to view this problem is that https://github.com/libp2p/rust-libp2p/blob/3e06e0bca3c2d29bc7fb0e67f880d14a74049483/core/src/upgrade/apply.rs#L230C77-L230C77 shouldn't always return UpgradeError::Apply. It should take into account that the upgrade_outbound could also return a UpgradeError::Select error if the protocol version is V1Lazy. As far as I've seen, the UpgradeError::Select(Failed) will be gracefully handled and wouldn't kill the connection.

Maybe this should be the right way of solving this, although I think it warrants some breaking changes

@mxinden
Copy link
Member

mxinden commented Jan 9, 2024

This is needed because V1Lazy will end up killing the entire connection if the protocol is not supported.

That comes as a surprise to me. Can you share details why that is the case?

Can you still reproduce this behavior on the latest release? I would consider the above a bug and suspect it has been fixed since, e.g. via #4701.

@alindima
Copy link
Author

We're not running on the latest release yet, there's work under way for upgrading libp2p in substrate/polkadot.

I digged a bit and discovered that this PR solved the issue of killing connections for a failed outbound stream: #3913. Thanks!

However, there's still an issue. We are now no longer notified of a failed request. We want to be able to handle that (and precisely know why it failed).

Suppose we make a request with V1Lazy to a peer that doesn't support our req-response protocol. Multistream-select will not do protocol negotiation rightaway, it'll send the request along with the multistream-select packet.
Then, the request OutboundUpgrade will fail with a generic std::io::Error::custom("Failed"). This will be ignored by the req-response behaviour (thanks to #3913).

The higher-level code will not be notified of this failure.

I think the root of the problem is that protocol negotiation errors are swallowed when using V1Lazy and converted to a custom IO error that is ignored.

@alindima
Copy link
Author

We are now no longer notified of a failed request

I may be wrong, as I only cherry-picked #3913 over v0.51.3 (which we are using in substrate). Reading the code on latest master, I see we should now report an OutboundFailure.

By reading the code on master, all of this should be fixed in the latest release, but I cannot test with it, as it's a ton of in-progress work to upgrade libp2p: paritytech/polkadot-sdk#1631.

As a workaround until we manage to upgrade libp2p and test it, I'll switch to using multistream-select V1 for now.

I'll close the issue and reopen it if it's still a problem after upgrading.

It may still be an issue, since it looks like the same symptoms are reported in: paritytech/substrate#14683.

@thomaseizinger
Copy link
Contributor

Yes, @oblique did fantastic work to report these errors correctly in #4701 (which got merged as part of #3913 if I remember correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants