-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a CoreIndex
commitment to candidate receipts
#92
Comments
Why? Don't we need this for on demand to verify that the claimant is the one that signed the collation?
You want to hack this into the protocol? I don't get why you not introduce a new version of the networking protocol to be able to append the |
No, that was the initial idea AFAIK, but we now want to only rely on the parachain runtime to decide who builds the parachain block. What collator provides the collation is irrelevant , it just has to be valid. There are plans to have some counter measures against collators spamming with invalid collations, Collator Protocol Revamp which hopefully should be implemented this year.
A new version doesn't solve the problem, the problem is that we always need to check all of the commitments when we validate a candidate. We also rely on candidate hashes to identify unique candidates. Having two distinct candidate commitments for same parachain block breaks consensus. Simply put, we need all nodes to basically encode/decode/reason about the same thing regardless if there is a minority which don't understand the |
2 bytes should suffice. |
You already answer yourself that this will not work, because the old will try to validate the signature. Thus, you can not simply "reclaim" space in the candidate. Why not use the node features to make a clean cut? To switch from the old to the new format? |
This But IIUC we need a way for the parachain runtime to specify to the relay chain validators/backing-group what core index the current commitment is for. So following on your idea, we don't really need to add it to XCM, what you really want to use is the XCM transport protocol not the language and executor. So you could still use UMP/MQ to transport this extra data, but do it outside of the XCM transport flow. Disclaimer: I'm not familiar with the details of parachain consensus and elastic scaling, so there might be better solutions than "abusing" the UMP queue (implementation abuse), but wanted to offer a slight alteration to your idea so we don't abuse the XCM language (spec abuse). |
That is a minor setback, forcing validators to upgrade is something tractable. Forcing all collators switch to a new format is hard breaking change that would put pressure on parachain teams and delaying the enablement of open collator sets for elastic scaling. My plan is to only require an upgrade for collators if they want to use elastic scaling, otherwise they can keep signing their collations and still not commit to a core index just like they do no, at least for a good period of time. Possibly until Omni-node becomes ubiquitous. |
I don't see how you force collators to upgrade? I mean basically you just enable support for the new format on the validators. The collators can send whatever they support, but the validators will reject the new format until it is enabled. |
thanks @acatangiu for the comment and discussion last week, indeed it makes no sense to add a new XCM instruction.
How I would see this working is altering the UMP transport layer definition so it can be used to support XCM messages and commitments which seems to be a small surface area for changes in the Parachains Runtime. One way I am thinking this can be implemented is to a have a terminator for XCM messages (something like an empty Vec) followed up by the core index commitment: in receive_upward_messages we would stop sending messages to MessageQueue pallet once we hit the terminator. |
Yeah, sure, we can also do that this sounds like an optimisation to what I just said, assuming we are talking about no longer checking collator signatures and using the bytes for more useful things. |
No. I'm speaking about creating a new version of the |
I agree that clean cut is generally better than hacking, but we have to take other concerns into consideration before we decide Yay/Nay on this topic. Development costThe clean way involves the following big changes on top of what is needed for my proposal:
Set of changes required for my proposal:
TimelineThe PVF changes and networking protocol changes take a lot of time to test and deploy in production and will create additional bugs to surface that will need fixing, further delaying the launch. Also, audits will take much longer since there would be more code to cover. On top of that, parachain teams already(Moonbeam roadmap for example) expect to use Elastic Scaling. The feature as it stands now it is unlikely to be usable by most teams since their collator sets are open. Why we should postpone the clean cutIMO the cost of not delivering Elastic Scaling without limitations this year outweighs the concerns of doing my isolated hack. Time to market matters. Launching Elastic Scaling but not lifting constraints on the MVP as soon as possible because we want to make things perfect doesn't have any value for Parachain teams.
The reclaimed space in the descriptor and the possibility to commit to additional things via the UMP transport layer provides enough so we don't need to change the CandidateDescriptor/CandidateCommitments in the near/medium term future. IMO the best time to do it would be as part of migration to JAM. |
On top of that, while yes this is not the most clean solution, it is also not that hacky*) and:
So while not the most pretty solution:
*) In the end, why not think of this as a message being sent to the relay chain. The only real hacky part is that we did not properly account for any messages to not be XCM, but the delimiter solution sounds good enough. The only code needing to look at this is candidate validation. Code impacted:
(3) is the most annoying, as it is user facing, but the amount of breakage should stay the same: If we ever end up doing the properly versioned thing (before JAM), then the reason will be that we introduced something else on top. Hence it would be another breaking change regardless. I agree, that something we want to be mandatory eventually, does not feel to be right as an extra message after a delimiter after the XCM messages. 😬 But ok, if that contract is well documented, it is not that bad. |
This is like a max 1 day change in Cumulus and the validation modules. We are just exposing a new function and on validation checking which version exists.
Why do all protocols change, when one of them changes?
This was clear since the beginning of the year, that this change would be required or at least quite early. I still remember talking to you about this.
If we open this door, I don't see us closing this door ever again, because the same arguments will be brought up again etc. |
Sounds very optimistic. I think we also need to consider the RFC, the impl, writing the unit and zombienet tests as well as deployment.
We have all the validation protocols under a single version -
Yes, I remember discussing it in the context of extracting a CoreIndex from statements based on validator indices, which is what we currently have implemented for the MVP. |
I've drafted a development and deployment plan with repurposing of fields : https://hackmd.io/xrzVVZ_qSZemEIVdEV8cpA |
Second implementation option draft: https://hackmd.io/fWYvO8HQSFKjUlnpgwjcKw |
The long route looks like it will take way too long to be feasible as a solution to elastic scaling, while the repurposing is really not that bad and should get the job done quickly enough. It also avoids needless breakage: Any tooling that does not actually try to check collator signatures for example, will continue to work even without any update. |
The second document is just much more detailed than the first one...
Not sure what you mean by this. There is no real time pressure on this. |
One pragmatic benefit I see in the longer approach is that we can introduce relay chain hashes to the PVF arguments while we are at it:
Currently we use the relay chain storage root to identify relay chain parents. IMO changing this is not critical, the relay chain storage root is a good enough block identifier currently. However having the relay parent header hash as digest would make it much more convenient to fetch the relay parent for a given parachain block. |
Doing a binary compatible upgrade path as long as we can is not a bad idea, it is much less effort, avoids breakage (and issues) and can be live much faster. We can make part of the collator signature field specify the version and have the remaining unused fields as "reserved", allowing for more binary compatible upgrades. The commitments part is actually purely optional and can stay that way: It is only used for verification of the core index in the descriptor. Only elastic scaling parachains need to bother at all. Going full versioned should in my opinion be an effort on its own, when actually needed. We have a couple of things, e.g. what @skunert mentioned, which right now are nice to have, but not crucial. We should go full versioned for something like Polkadot 3.0, where we might need some actual impact full changes, which can not be introduced in a binary compatible way and then we also add all these other nice to haves and can then also move the CoreIndex into its own commitment. In JAM we should also make sure to have this properly versioned from the getgo. |
Also it is worth pointing out that with the binary compatible change, we can have this with one release rolled out:
Thus we can easily have fixed factor scaling fully delivered by EoY. |
But it is worked on. This is no argument at all.
Yeah, like not having slashing enabled for all the things, but the stuff is being worked on.
We can still achieve it and even if it is not achieved, the world will not go down.
You are basically proposing a hard fork, without calling it hard fork. |
Update: we discussed the 2 approaches offline and decided the implementation will be based on the initial proposal. |
Following the discussion on #92, this is a proposal to introduce the required core index commitments to make elastic scaling work securely with open collator sets. --------- Signed-off-by: Andrei Sandu <[email protected]> Co-authored-by: Bastian Köcher <[email protected]>
Starting this pre-RFC conversation to gather some feedback on some changes of CandidateReceipt and CommitedCandidateReceipt primitives.
The necessary changes are:
CoreIndex
inCandidateCommitments
core_index: CoreIndex
field inCandidateDescriptor
These are needed to remove the limitations of only using a trusted collator set for elastic scaling parachains. Without a
CoreIndex
commitment in the candidate receipt it is possible for malicious collators to spam relay chain by sending the same collation to all backing groups assigned to a parachain at a RCB. In such a scenario elasticity is effectively disabled as all backing groups would back the same candidate in parallel instead of multiple chained candidates.The candidate receipt primitive is used across networking protocols, Parachains Runtime, node implementation, collators and even tooling. Any breaking change here is very hard to deploy n practice without upgrading everything at the same time or breaking anyone. So, the following is an approach without breaking things but which might be considered hacky.
Please keep in mind that this is very importat for short/medium term in the context of the
Elastic Scaling
feature. As such, a proposal around a more flexible, backwards compatible and future-proof (allowing for more dramatic changes) is out of scope in this proposal, but otherwise something that I am working already on.Proposed approach
Changes in CandidateDescriptor:
collator: CollatorId
and 64 bytes fromsignature: CollatorSignature
fields asreserved
fieldscore_index: u32
field.CandidateCommitments doesn't really need to be changed, but one idea is to define a special XCM instruction like
CommitCoreIndex(u32)
that is appended as the last message in theupward_messages
vec. The message wouldn't ever be executed, but would just serve as a commitment to a specificCoreIndex
.I have also thought about appending an additional
core_index: u32
field to the end of the structure but that doesn't really seem to work because we compute theCandidateHash
withhash(CandidateDescriptor, hash(CandidateCommitments))
. Older non-upgraded validator nodes for example would get a different commitment hash if they don't decode and hash the whole thing and this would break consensus.Any better and less hacky ideas especially regarding the XCM stuff would help me a lot.
The text was updated successfully, but these errors were encountered: