-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add creation time for DHT authority discovery records #91
Add creation time for DHT authority discovery records #91
Conversation
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
|
||
## Motivation | ||
|
||
Currently, we use the Kademlia DHT for storing records regarding the p2p address of an authority discovery key, the problem is that if the nodes decide to change its PeerId/Network key it will publish a new record, however because of the distributed and replicated nature of the DHT there is no way to tell which record is newer so both old PeerId and the new PeerId will live in the network until the old one expires(36h), that creates all sort of problem and leads to the node changing its address not being properly connected for up to 36h. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that if the nodes decide to change its PeerId/Network key it will publish a new record, however because of the distributed and replicated nature of the DHT there is no way to tell which record is newer so both old PeerId and the new PeerId will live in the network until the old one expires(36h)
I don't think that's true? The new record is very much supposed to overwrite the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Putting a record" consists in finding the 20 peers whose PeerId is the closest to the relevant key (here the relevant key is the authority discovery key) and sending them the record. If later you want to modify the record, you find these 20 peers again (they are normally still the same) and send them a new record, which overwrites the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true? The new record is very much supposed to overwrite the old one.
I've tested this several times on different network configurations and the old record still survives in the network until it expires, as far as I can understand this happens because you do publish the record initially to the closest 20 nodes, but then when the record reaches for different reasons other nodes will still store it locally with this formula: https://github.com/libp2p/rust-libp2p/blob/ad7ad5b3fc5b4bc9a431ece90e9a5ce8c33ca0e2/protocols/kad/src/behaviour.rs#L1796, so not just the initial 20 nodes would store a record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the record reaches for different reasons
Well, this isn't really supposed to happen. It can happen in some rare cases where the publisher has been lied to during its iterative query, for example.
Even if the record reaches nodes other than the initial 20 closest nodes, reading the record (i.e. querying it) is always done against the 20 closest nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, double checked again, nodes don't actually have perfect connectivity, so their view of what are peer ids are not always the same, especially after your restart, so we end up in a situation where we don't always publish to the same 20 closest nodes, so you need just one node to still have the old record and then because that will replicate the record to its view of 20 closest nodes, we end up in a situation where the old record will override the new record.
The issue can be replicated systematically, by running a network of around 30-40 nodes, you let it run for sometime then you restart one of the node with a different PeerID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the exact scenario I'm testing is like this:
- Run a stable polkadot network.
- Just restart a single node with a new network key(PeerId) which makes the node to publish a new value on the dht for the key it is responsible.
What I'm observing is that the old record of the single node is actually on more than the 20 closest nodes that get updated at step 2) and because those nodes do replication https://github.com/libp2p/rust-libp2p/blob/ad7ad5b3fc5b4bc9a431ece90e9a5ce8c33ca0e2/protocols/kad/src/behaviour.rs#L2520 regularly, it is overwriting the newer record.
I consider this to be a valid use case since it is what happens when nodes upgrade and forget to persist their network key, it happened a few times in the past on kusama.
For the tests I ran the network does not seem to be split since I see everyone communicating with everyone on the other protocols, so what do you think am I missing here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole discussion is off-topic for this RFC, but the record could be on more than 20 nodes simply because during step 1 you have temporary net splits. What's important is that the 20 nodes where the record is updated at step 2 are the same as the ones queried by the rest of the network when they want to read the record. The logs would tell you that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's important is that the 20 nodes where the record is updated at step 2 are the same as the ones queried by the rest of the network when they want to read the record. The logs would tell you that
Right, but because the one(outside of the closest 20) still has the old record and does replication on the closest 20, all will have the old record.
This whole discussion is off-topic for this RFC
It is the reason I started the RFC in the first place, but I agree it shouldn't affect the RFC, I'll go ahead and apply your suggestion about using a single signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but because the one(outside of the closest 20) still has the old record and does replication on the closest 20, all will have the old record.
As far as I can remember only the original publisher periodically re-publishes and can periodically re-publish.
When receiving a record, a node ensures that the PeerId of the sender of the record (i.e. the "live" PeerId of the connection where the record is being received) can be found in the list of PeerIds that is found in the record and that is covered by the signature.
Otherwise you would have a pretty obvious vulnerability, and we thought of these kind of things when designing the system.
It is the reason I started the RFC in the first place, but I agree it shouldn't affect the RFC, I'll go ahead and apply your suggestion about using a single signature.
I see this RFC as a hack to work around a bug somewhere, and it would be a better idea to actually fix the bug. However I'm personally not really involved in Polkadot anymore, so I'm not going to vote against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to this.
As far as I can remember only the original publisher periodically re-publishes and can periodically re-publish.
When receiving a record, a node ensures that the PeerId of the sender of the record (i.e. the "live" PeerId of the connection where the record is being received) can be found in the list of PeerIds that is found in the record and that is covered by the signature.
Otherwise you would have a pretty obvious vulnerability, and we thought of these kind of things when designing the system.
Yes, you are correct, re-publishing is done only by the original published, but there is also replication which is done by all peers that store the record, the integrity of the record is guaranteed by the fact that the signature is checked, so you can't replicate anything, you need to replicate the original record.
I see this RFC as a hack to work around a bug somewhere, and it would be a better idea to actually fix the bug. However I'm personally not really involved in Polkadot anymore, so I'm not going to vote against.
I've investigated this and I couldn't find anything wrong with the way it works, the scenario this fixes it is just a consequence of the fact that not all nodes have the same routing table always, so at different point in time the 20 closest 20 nodes could differ slightly.
However I'm personally not really involved in Polkadot anymore, so I'm not going to vote against.
Then if you don't strongly oppose this RFC, I intend to move forward with adding this new field(creation_time) since I see benefits in having it, so we can use it to always pick the newest record.
... to sign it only once Signed-off-by: Alexandru Gheorghe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor questions, otherwise it looks good to me 👍
+ | ||
``` | ||
|
||
Each time a node wants to resolve an authorithy ID it will issue a query with a certain redundancy factor, and from all the results it receives it will decide to pick only the newest record. Additionally, the nodes that answer with old records will be updated with the newer record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, the nodes that answer with old records will be updated with the newer record.
So, the node that send the DHT request will inform these nodes?
Strictly speaking, I don't think that this needs to be part of the RFC, because even a node not doing this would behave well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct, I was just trying to explain how a well behaving validator should/could use this field, I can explicitly state that this part is optional or remove it.
Signed-off-by: Alexandru Gheorghe <[email protected]>
/rfc propose |
Hey @bkchr, here is a link you can use to create the referendum aiming to approve this RFC number 0091. Instructions
It is based on commit hash fbd4296f2a8a14cb2a17b2b0f0260b1df6e80783. The proposed remark text is: |
Voting for this referenda is ongoing. Vote for it here |
PR can be merged. Write the following command to trigger the bot
|
/rfc process 0x997290e7221101d7fe625a2f4eb58dd73393845a696d3ac3682c9f546a25c00b |
The on-chain referendum has approved the RFC. |
This adds a new creation time for authority discovery records stored in the DHT.
This RFC has already an implementation in polkadot-sdk here: paritytech/polkadot-sdk#3786