-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP-509: Stateless validation stage 0 #509
Conversation
initial draft
Your Render PR Server URL is https://nomicon-pr-509.onrender.com. Follow its progress at https://dashboard.render.com/static/srv-ck51l1o21fec73aapqgg. |
Add validator role change section
Hi @walnut-the-cat – thank you for starting this proposal. As the moderator, I labeled this PR as "Needs author revision" because we assume you are still working on it since you submitted it in "Draft" mode. Please ping the @near/nep-moderators once you are ready for us to review it. We will review it again in early January, unless we hear from you sooner. We typically close NEPs that are inactive for more than two months, so please let us know if you need more time. |
Documenting changes to validators and describing basics of reference implementation.
In near/nearcore#11582 we're increasing `combined_transactions_size_limit`, so let's update the NEP to match the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As SME and working group member, I lean towards approving the NEP. It is exciting to see Near continue to push towards being a completely scalable protocol. Avoiding the complexity of fraud proofs while having an eye towards using ZK technology in the future is very clever. Thanks to everyone for their hard work on designing and implementing this large protocol change.
fix lint warning and apply Michael's suggestion
@mm-near The "40 years at 90% confidence" calculation was done by me. It assumes that the attacker has just barely less than 1/3 of the total stake (so they cannot outright take over the protocol), which is about 197 million $NEAR as of today. The calculation determines the probability of a shard assignment (recall that stake is converted to "mandates" and these are randomly assigned to shards) in which at least one shard has 2/3 of its assigned stake controlled by the attacker. In that case the attacker would be free to push an invalid state transition because it could sign the invalid state witness itself. With 68 mandates per shard and 6 shards total this probability is Then we assume the shard assignments are independent so that we can model it as a Bernoulli process and see how many "trials" it would take before we have a "success" (i.e. how many random shard assignments are there before the attacker obtains a 2/3 majority in one shard). The probability of having Now that we know the number of trials we can convert it into a time. At 1 trial per second that is almost 4 years, but at the time Bowen was suggesting to shuffle less often than every block. At 1 trial per 10 seconds we get almost 40 years, which is the number I reported. We can also do this calculation the other way though. If we take the 5 year timeline you propose, then we can convert that into a number of trials. Let's assume one trial per second since I think the current implementation does shuffle validators every block. Then that is around 157 million trials and we want to know in our Bernoulli process what is probability of having at least 1 success within that many trials. This probability is 1 minus the probability that we have all those trials fail in a row, so If you keep the number of mandates per shard the same then this whole calculation does not change much as you increase the number of shards because the theory says that the dependency on the number shards is not very strong after you have more than a few. So the base probability of |
NEP Status (Updated by NEP Moderators)Status: VOTING SME reviews:
Protocol Work Group voting indications (❔ | 👍 | 👎 ):
|
# Feature to stabilize This PR stabilizes the Congestion Control and Stateless Validation protocol features. They are assigned separate protocol features and the protocol upgrades should be scheduled separately. # Context * near/NEPs#539 * near/NEPs#509 # Testing and QA Those features are well covered in unit, integration and end to end tests and were extensively tested in forknet and statelessnet. # Checklist - [x] Link to nightly nayduck run (`./scripts/nayduck.py`, [docs](https://github.com/near/nearcore/blob/master/nightly/README.md#scheduling-a-run)): https://nayduck.nearone.org/ - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
@mm-near the latency you mentioned matches existing one. Before: BP sends block quickly on receiving chunks, but block is validated only after other block producers apply all its chunks - it was their only way to validate chunks in block. So the next block production happens only after previous chunks were applied. Also, BP&CPs are also CVs, so stake on chunk validation remains big. Memtrie is much faster than disk trie, which compensates network latencies for sending state witnesses and endorsements. UPD: the actual additional latency is introduced on chunk producer side: near/nearcore#10584 Shortly: to produce chunk N, CP must apply chunk N-1, for which BP must produce block N-1, for which CVs must validate ( = apply) chunk N-1. So applying of chunk N-1 appears twice. Side notes:
Let's say only one shard is touched by transaction. To get outcome, we query the RPC node which tracks touched shard. |
# Feature to stabilize This PR stabilizes the Congestion Control and Stateless Validation protocol features. They are assigned separate protocol features and the protocol upgrades should be scheduled separately. # Context * near/NEPs#539 * near/NEPs#509 # Testing and QA Those features are well covered in unit, integration and end to end tests and were extensively tested in forknet and statelessnet. # Checklist - [x] Link to nightly nayduck run (`./scripts/nayduck.py`, [docs](https://github.com/near/nearcore/blob/master/nightly/README.md#scheduling-a-run)): https://nayduck.nearone.org/ - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
# Feature to stabilize This PR stabilizes the Congestion Control and Stateless Validation protocol features. They are assigned separate protocol features and the protocol upgrades should be scheduled separately. # Context * near/NEPs#539 * near/NEPs#509 # Testing and QA Those features are well covered in unit, integration and end to end tests and were extensively tested in forknet and statelessnet. # Checklist - [x] Link to nightly nayduck run (`./scripts/nayduck.py`, [docs](https://github.com/near/nearcore/blob/master/nightly/README.md#scheduling-a-run)): https://nayduck.nearone.org/ - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
The main purpose of the Reed Solomon Erasure encoding for state witness is to reduce the load on the chunk producer for distributing the state witness. The recipients of the state witness are all the chunk validators, and they are the ones who participate in the partial witness forward and not block producers. This way we don't put too much network load on the block producers and the network load is localized to the chunk validators. Nodes that have higher number of mandates are validators for multiple shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a Protocol WG member, I lean towards approving this proposal since it is a necessary step towards effective sharding.
My main concern is concerning chunk validators:
In this approach, I'm concerned with chunk validators' incentives to validate new chunks. As I understand from this document, the optimal strategy for individual chunk validators is to accept every chunk. As long as there is one honest chunk validator, work is not needed, and they don't get penalized for incorrectly endorsing an invalid chunk.
### Assumptions | ||
|
||
* Not more than 1/3 of validators (by stake) is corrupted. | ||
* In memory trie is enabled - [REF](https://docs.google.com/document/d/1_X2z6CZbIsL68PiFvyrasjRdvKA_uucyIaDURziiH2U/edit?usp=sharing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the content of the linked document to neps/assets
in this repository, in case the current link gets broken for some reason?
As we pointed out above, current formula `chunk_validator_quality_ratio` is problematic. | ||
Here it brings even a bigger issue: if chunk producers don't produce chunks, chunk validators will be kicked out as well, which impacts network stability. | ||
This is another reason to come up with the better formula. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chunk validators can collude and not endorse some chunks in a way that some chunk producers or other chunk validators get kicked out by not getting their chunks included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is more relevant to the chunk endorsement process, not chunk validator kickouts/rewards.
And the base assumption of new approach is to make event "1/3 validators of chunk collude" mean that "1/3 of all validators collude" with high probability, so in this case the base blockchain security assumption fails, on which we rely on.
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
@mfornet answered to chunk validator-related comments. Yeah, this is a known problem. We discussed it couple times. One idea was to introduce "honeypot state witnesses", the goal of which would be to verify that state witnesses can get invalidated, and penalise validators for blind approvals. However, the counterarguments are that
So any of these solutions would introduce additional complexity (which is already very substantial) and the benefit didn't become clear. |
lint error
Thank you to everyone who attended the Protocol Work Group meeting! The working group members reviewed the NEP and reached the following consensus: Status: Approved (Meeting Recording: https://youtu.be/058BZEyXzgU)
@walnut-the-cat Thank you for authoring this NEP |
WIP