-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Prepare for network protocol version upgrades #5084
Conversation
cc @eskimor wdyt as a first pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we don't need many of those mass renamings. E.g. with req_res_v1::ChunkFetchingRequest
, I don't see why we would need to rename ChunkFetchingRequest
to ChunkFetchingV1Request
. Rust's module system is pretty nice here.
In general, I think we should have some discussion on how the upgrades are supposed to be executed, based on your understanding what actually is going to change.
node/network/bridge/src/lib.rs
Outdated
); | ||
|
||
send_message( | ||
&mut network_service, | ||
peers, | ||
PeerSet::Validation, | ||
1, // TODO [now]: constant / SendValidationMessageV1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should just wrap ValidationProtocol
version variants in an enum like:
enum VersionedValidationProtocol {
V1(v1::ValidationProtocol),
V2(v2::ValidationProtocol),
}
impl VersionedValidationProtocol {
pub fn version(&self) -> Version {
match self {
V1(_) -> 1,
V2(_) -> 2,
}
}
/// Encode actual payload.
///
/// Any version information is lost here, so the used transport should already be version
/// aware. You can determine the needed version via `version()`.
pub fn encode(&self) -> Vec<u8>
match self {
V1(payload) -> payload.encode(),
V2(payload) -> payload.encode(),
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and probably enum VersionedBitfieldDistributionMessage
where NetworkBridgeEvent::focus
automatically does the transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revisited this (everything is just a type alias of the same Versioned
enum, which makes code a bit easier to write as you can just do Versioned::V1
everywhere).
I would rather have encoding handled within the network bridge through functions like fn send_validation_message_v1
which take an explicit type. Rather than doing encoding in a way that masks the type.
@@ -185,15 +191,15 @@ impl TestNetworkHandle { | |||
async fn disconnect_peer(&mut self, peer: PeerId, peer_set: PeerSet) { | |||
self.send_network_event(NetworkEvent::NotificationStreamClosed { | |||
remote: peer, | |||
protocol: peer_set.into_protocol_name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this defaulting is sensible. Either we should just disconnect all versions or let the caller specify the version.
Also how do you have our upgrade path in mind? I am currently thinking on having the nodes support both the old and the new version and what is actually run, depends on the currently available runtime versions?
So a subsystem will adhere to the old protocol as long as it has not seen any block with the new runtime, once it sees a block with the new runtime enacted it will start operating on the new protocol as well. Eventually all forks with the old runtime die out and only the new code is running. (And we can remove it on the next node update.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that enacting network protocol upgrades are going to be orthogonal to the actual enabling of a runtime-api protocol upgrade.
I'm not imagining a situation where a new runtime-api protocol upgrade gets triggered, and then nodes all disconnect from each other and reconnect on the new protocol.
I'm imagining that nodes already connect on the new protocol, which supports both runtime-api versions. then the runtime-api upgrade gets triggered, and then they stop interacting with peers on the old protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also have v2 nodes communicate with other nodes over v1 and only have v1 support the old primitives. I think either one works, but we need v2 nodes to support the old way of doing things one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enacting network protocol upgrades
Just curious - how a network protocol upgrade is initiated? Via binary upgrade and letting the version negotiation do the rest or something more complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, via binary upgrade. I don't know the details of the libp2p version negotiation, though.
// | ||
// for peer-set management, the default should be used regardless of | ||
// the negotiated version. | ||
// TODO [now] : verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this is supposed to work. Wouldn't this trigger a network split?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not according to @tomaka. In fact, many of the places I used the version explicitly are wrong and we should always be using the default protocol name string even when dealing with peers on lower-versioned negotiated fallbacks.
This was a find-and-replace accident, actually. My main intention was to update the |
I think it's better to extract the message handling logic in If I understand correctly the idea of the fallback protocol is to communicate with peers not yet updated. Is it possible to have any complications if some nodes use |
Yes, can you file a follow-up issue labeled 'refactor'?
That's correct; the fallback is for communicating with peers on older versions of the protocol. I think there could be complications, but it depends on how we use the v2 of the protocol. My intention is to use network protocol upgrades to navigate runtime-api upgrades: the blockchain protocol will be changed by governance, so v2 needs to support the old blockchain protocol and the new one. Presumably, the blockchain protocol wouldn't be upgraded until there is a social consensus that enough nodes have upgraded to a network-v2-supporting node version. I'd like to do a 'vstaging' for network protocol, similar to what we've done for runtime-API (which, in my book, signifies the 'blockchain protocol' version to the node). Then we can iterate in #5022 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few design routes open to us for how we'd actually implement V2. Due to the API of sc-network, there's no simple way for a peer on V2 to say "I'm sending this message as a V1 message", which means that we always have to decode the messages received from a V2 peer as V2, and from V1 peers as V1. And so on.
Does it make sense to tie message version to protocol version via the type-system somehow? Is it possible to accidentally send V1 message over V2 protocol?
No, not unless we write a bug. Feel free to do a follow-up if you want to do something in the type-system. I don't think it'd be a good idea to block this PR on more research. Basically, I don't really care. I think this code works and is reasonably error-resistant. Anything at the type-system level is likely to be extremely clunky, complex, and probably will have caveats of its own. |
bot merge |
Waiting for commit status. |
bot merge |
Waiting for commit status. |
Closes #4964
This uses
sc-network
's feature fornegotatiated_fallback
s to allow the node to support multiple network protocol versions.Nodes will first try to connect on the most recent protocol (defined under
PeerSet
, in our case), but will then attempt to connect on the fallbacks. When a peer connects on a fallback protocol (i.e. the peer is on an older version than we are), that's noted in theNotificationStreamOpened
event.This is the only place in
sc-network
where ourNotificationsProtocol
would see the fallback protocol name. In all other cases, for both sending and receiving messages, the "main" protocol name should be used.I also made the
network/protocol
crate useVersioned
types. These types only support V1 for now, but can support V2 in the future. Subsystems tell the network bridge subsystem to send aVersionedValidationProtocol
message orVersionedCollationProtocol
message. It's the responsibility of the caller to send messages of the correct version to peers. The resultingWireMessage
sent as a notification will differ based on theVersioned
variant that is passed to the network bridge.Subsystems also receive versioned types for their message. e.g. statement distribution will received
net_protocol::VersionedStatementDistribution
messages from the network bridge.We have a few design routes open to us for how we'd actually implement V2. Due to the API of sc-network, there's no simple way for a peer on V2 to say "I'm sending this message as a V1 message", which means that we always have to decode the messages received from a V2 peer as V2, and from V1 peers as V1. And so on.
There might be specific cases where some message from V1 will be semantically required until some consensus protocol upgrade makes it outdated. There also might not be; it's possible that all functionality we'd add in V2 would be a superset of the possible functionality in V1. V2 might allow V1 messages as a special 'backcompat' variant. In which case V2 peers would be able to send V1 messages. Or V2 messages might just duplicate certain necessary but 'outdated' backcompat messages which eventually become illegal. These challenges will be addressed in a future PR.
Specific Changes:
network-protocol::peer_set
version-aware (change APIs to force the user to deal mostly with(PeerSet, ProtocolVersion)
)decode_all
for network messages.ProtocolVersion
toNetworkBridgeEvent::PeerConnected
XSubsystemMessage::NetworkBridgeUpdateV1(NetworkBridgeEvent<protocol_v1::XMessage>)
withXSubsystemMessage::NetworkBridgeUpdate(NetworkBridgeEvent<VersionedXMessage>
. ThePeerId
/View
mechanism unlikely to change, but we can always introduceVersionedPeer
/VersionedView
if we need to.