Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Prepare for network protocol version upgrades #5084

Merged
merged 26 commits into from
Apr 21, 2022

Conversation

rphmeier
Copy link
Contributor

@rphmeier rphmeier commented Mar 11, 2022

Closes #4964

This uses sc-network's feature for negotatiated_fallbacks to allow the node to support multiple network protocol versions.

Nodes will first try to connect on the most recent protocol (defined under PeerSet, in our case), but will then attempt to connect on the fallbacks. When a peer connects on a fallback protocol (i.e. the peer is on an older version than we are), that's noted in the NotificationStreamOpened event.

This is the only place in sc-network where our NotificationsProtocol would see the fallback protocol name. In all other cases, for both sending and receiving messages, the "main" protocol name should be used.

I also made the network/protocol crate use Versioned types. These types only support V1 for now, but can support V2 in the future. Subsystems tell the network bridge subsystem to send a VersionedValidationProtocol message or VersionedCollationProtocol message. It's the responsibility of the caller to send messages of the correct version to peers. The resulting WireMessage sent as a notification will differ based on the Versioned variant that is passed to the network bridge.

Subsystems also receive versioned types for their message. e.g. statement distribution will received net_protocol::VersionedStatementDistribution messages from the network bridge.

We have a few design routes open to us for how we'd actually implement V2. Due to the API of sc-network, there's no simple way for a peer on V2 to say "I'm sending this message as a V1 message", which means that we always have to decode the messages received from a V2 peer as V2, and from V1 peers as V1. And so on.

There might be specific cases where some message from V1 will be semantically required until some consensus protocol upgrade makes it outdated. There also might not be; it's possible that all functionality we'd add in V2 would be a superset of the possible functionality in V1. V2 might allow V1 messages as a special 'backcompat' variant. In which case V2 peers would be able to send V1 messages. Or V2 messages might just duplicate certain necessary but 'outdated' backcompat messages which eventually become illegal. These challenges will be addressed in a future PR.

Specific Changes:

  • Be explicit about request/response protocols being V1
  • Make network-protocol::peer_set version-aware (change APIs to force the user to deal mostly with (PeerSet, ProtocolVersion))
  • Update network-bridge to be explicitly V1-aware
  • Use decode_all for network messages.
  • Add a ProtocolVersion to NetworkBridgeEvent::PeerConnected
  • Replace XSubsystemMessage::NetworkBridgeUpdateV1(NetworkBridgeEvent<protocol_v1::XMessage>) with XSubsystemMessage::NetworkBridgeUpdate(NetworkBridgeEvent<VersionedXMessage>. The PeerId/View mechanism unlikely to change, but we can always introduce VersionedPeer/VersionedView if we need to.

@rphmeier rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. labels Mar 11, 2022
@rphmeier
Copy link
Contributor Author

cc @eskimor wdyt as a first pass?

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we don't need many of those mass renamings. E.g. with req_res_v1::ChunkFetchingRequest, I don't see why we would need to rename ChunkFetchingRequest to ChunkFetchingV1Request. Rust's module system is pretty nice here.

In general, I think we should have some discussion on how the upgrades are supposed to be executed, based on your understanding what actually is going to change.

bridges/bin/rialto/node/src/overseer.rs Outdated Show resolved Hide resolved
);

send_message(
&mut network_service,
peers,
PeerSet::Validation,
1, // TODO [now]: constant / SendValidationMessageV1?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should just wrap ValidationProtocol version variants in an enum like:

enum VersionedValidationProtocol { 
	V1(v1::ValidationProtocol),
	V2(v2::ValidationProtocol),
}

impl VersionedValidationProtocol {
	pub fn version(&self) -> Version {
		match self {
			V1(_) -> 1,
			V2(_) -> 2,
		}
	}

	/// Encode actual payload.
	///
	/// Any version information is lost here, so the used transport should already be version
	/// aware. You can determine the needed version via `version()`.
	pub fn encode(&self) -> Vec<u8>
		match self {
			V1(payload) -> payload.encode(),
			V2(payload) -> payload.encode(),
		}
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and probably enum VersionedBitfieldDistributionMessage where NetworkBridgeEvent::focus automatically does the transformation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revisited this (everything is just a type alias of the same Versioned enum, which makes code a bit easier to write as you can just do Versioned::V1 everywhere).

I would rather have encoding handled within the network bridge through functions like fn send_validation_message_v1 which take an explicit type. Rather than doing encoding in a way that masks the type.

@@ -185,15 +191,15 @@ impl TestNetworkHandle {
async fn disconnect_peer(&mut self, peer: PeerId, peer_set: PeerSet) {
self.send_network_event(NetworkEvent::NotificationStreamClosed {
remote: peer,
protocol: peer_set.into_protocol_name(),
Copy link
Member

@eskimor eskimor Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this defaulting is sensible. Either we should just disconnect all versions or let the caller specify the version.
Also how do you have our upgrade path in mind? I am currently thinking on having the nodes support both the old and the new version and what is actually run, depends on the currently available runtime versions?

So a subsystem will adhere to the old protocol as long as it has not seen any block with the new runtime, once it sees a block with the new runtime enacted it will start operating on the new protocol as well. Eventually all forks with the old runtime die out and only the new code is running. (And we can remove it on the next node update.)

Copy link
Contributor Author

@rphmeier rphmeier Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that enacting network protocol upgrades are going to be orthogonal to the actual enabling of a runtime-api protocol upgrade.

I'm not imagining a situation where a new runtime-api protocol upgrade gets triggered, and then nodes all disconnect from each other and reconnect on the new protocol.

I'm imagining that nodes already connect on the new protocol, which supports both runtime-api versions. then the runtime-api upgrade gets triggered, and then they stop interacting with peers on the old protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also have v2 nodes communicate with other nodes over v1 and only have v1 support the old primitives. I think either one works, but we need v2 nodes to support the old way of doing things one way or another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enacting network protocol upgrades

Just curious - how a network protocol upgrade is initiated? Via binary upgrade and letting the version negotiation do the rest or something more complicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, via binary upgrade. I don't know the details of the libp2p version negotiation, though.

//
// for peer-set management, the default should be used regardless of
// the negotiated version.
// TODO [now] : verify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is supposed to work. Wouldn't this trigger a network split?

Copy link
Contributor Author

@rphmeier rphmeier Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not according to @tomaka. In fact, many of the places I used the version explicitly are wrong and we should always be using the default protocol name string even when dealing with peers on lower-versioned negotiated fallbacks.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 11, 2022

E.g. with req_res_v1::ChunkFetchingRequest, I don't see why we would need to rename ChunkFetchingRequest to ChunkFetchingV1Request

This was a find-and-replace accident, actually. My main intention was to update the req_res::Protocol and req_res::outgoing::Requests names to be explicitly V1

@rphmeier rphmeier marked this pull request as ready for review April 15, 2022 22:12
@rphmeier rphmeier removed the A3-in_progress Pull request is in progress. No review needed at this stage. label Apr 15, 2022
@rphmeier rphmeier added the A0-please_review Pull request needs code review. label Apr 15, 2022
@tdimitrov
Copy link
Contributor

I think it's better to extract the message handling logic in handle_network_messages and handle_subsystem_messages as separate functions in node/network/bridge/src/lib.rs. Maybe as a part of another PR.

If I understand correctly the idea of the fallback protocol is to communicate with peers not yet updated. Is it possible to have any complications if some nodes use v2 while other use v1 of the protocol at the same time?

@rphmeier
Copy link
Contributor Author

rphmeier commented Apr 19, 2022

I think it's better to extract the message handling logic in handle_network_messages and handle_subsystem_messages as separate functions in node/network/bridge/src/lib.rs. Maybe as a part of another PR.

Yes, can you file a follow-up issue labeled 'refactor'?

If I understand correctly the idea of the fallback protocol is to communicate with peers not yet updated. Is it possible to have any complications if some nodes use v2 while other use v1 of the protocol at the same time?

That's correct; the fallback is for communicating with peers on older versions of the protocol.

I think there could be complications, but it depends on how we use the v2 of the protocol. My intention is to use network protocol upgrades to navigate runtime-api upgrades: the blockchain protocol will be changed by governance, so v2 needs to support the old blockchain protocol and the new one. Presumably, the blockchain protocol wouldn't be upgraded until there is a social consensus that enough nodes have upgraded to a network-v2-supporting node version.

I'd like to do a 'vstaging' for network protocol, similar to what we've done for runtime-API (which, in my book, signifies the 'blockchain protocol' version to the node). Then we can iterate in #5022

@tdimitrov
Copy link
Contributor

Yes, can you file a follow-up issue labeled 'refactor'?

Sure: paritytech/polkadot-sdk#838

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a few design routes open to us for how we'd actually implement V2. Due to the API of sc-network, there's no simple way for a peer on V2 to say "I'm sending this message as a V1 message", which means that we always have to decode the messages received from a V2 peer as V2, and from V1 peers as V1. And so on.

Does it make sense to tie message version to protocol version via the type-system somehow? Is it possible to accidentally send V1 message over V2 protocol?

node/network/bridge/src/lib.rs Outdated Show resolved Hide resolved
@rphmeier
Copy link
Contributor Author

rphmeier commented Apr 21, 2022

Is it possible to accidentally send V1 message over V2 protocol?

No, not unless we write a bug. Feel free to do a follow-up if you want to do something in the type-system. I don't think it'd be a good idea to block this PR on more research.

Basically, I don't really care. I think this code works and is reasonably error-resistant. Anything at the type-system level is likely to be extremely clunky, complex, and probably will have caveats of its own.

@rphmeier
Copy link
Contributor Author

bot merge

@paritytech-processbot
Copy link

Waiting for commit status.

@paritytech-processbot
Copy link

Merge cancelled due to error. Error: Head SHA changed from ac07199 to 3bc077e

@rphmeier
Copy link
Contributor Author

bot merge

@paritytech-processbot
Copy link

Waiting for commit status.

@paritytech-processbot paritytech-processbot bot merged commit 55bf031 into master Apr 21, 2022
@paritytech-processbot paritytech-processbot bot deleted the rh-network-protocol-upgrades branch April 21, 2022 16:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Groundwork for network protocol upgrades
4 participants