-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime upgrade: Methods to avoid ever including parachain code in critical-path data #967
Comments
I've lost our recent issue discussing this now, but.. We could handle We could reuse this technique for MEV protections in parachains too: We split the parachain block into "run now" and "run 10 slots in the future", perhaps by pushing a much of transactions into the state, but preferably by splitting the availability encoding. In other words, we ideally make block n actually process the block of transactions placed into availability by block n-10. We then permute the transaction order by the relay chain randomness, so transactions could now fail but block n's backing checker marks the bad ones. This provides MEV protection. We'll include two ephemeral decryption keys associated to sassafras slot assignment proofs, for which the upcoming block producer knows the secret key, but when the block producer makes their sassafras block they delete the first secret key, as they've already decrypted any transactions, and then publish the second in the header. We turn this into even stronger MEV protection by decrypting in block n the transactions placed into availability in block n-10 using the keys published by the intervening blocks. In other words, we'd prevent MEV by running something vaguely like mixnet style decrpytion on-chain. In both cases, we need a whole block to either hang out in state for 10 slots or else provide some means by which the block 10 slots later fetches it form availability. |
I'm generally not a fan of the "treat code upgrades as a special block" because it's unclear how Cumulus should handle that block. As mentioned in the issue, we have the goal that the produced Cumulus chain can be synchronized entirely on its own. I don't think we could do the 'special block' thing unless we altered Substrate itself to support those types of special blocks. That sounds really difficult so it's a class of solution I would prefer to avoid. |
Yes, it'd ask substrate to treat special blocks like detached state data and alters pruning rules, so yes it touches several things and I'm unsure the complexity. It's roughly your 2 though, no?
I should reread my own thoughts in paritytech/polkadot#3211 too. ;) |
Not sure if there is a way to avoid putting 'validation_code' in memory when running the PVF? Does not seems easy without specific validation of block data in polkadot. (or some mechanism involving specific host function that would build some specific hashing with the external validition_code, and thus a validation function a bit different than the runtime (or overload of a host function for it as currently done for diverging code)). |
@cheme We don't care (that much) about memory usage. This is about PoV size. I am not sure you have understood the issue well enough. The only change this needs on the trie side is to make sure that when overwriting but not reading |
When performing parachain code upgrades, we currently include the new parachain code.
The code we have empirically for parachains is quite large, typically in the 500K to 800K range (Sergei: I observed PVFs up to a couple of megabytes).
Avoiding code in the critical path is important because it reduces friction at runtime upgrade points, if backing groups have relatively low bandwidth. It's not unreasonable for upgrade blocks to take a few minutes to get backed in the status quo. It also makes the code size more independent from the PoV size, opening up the opportunity for parachain developers to build more complex runtimes without being affected by restrictions targeting critical-path bandwidth.
This issue will be split into two sections, one for each of these points.
Solving Code in Candidate Receipts: Hash-based announcements
At the moment, PVFs announce code upgrades by returning the full code when it's allowed, according to the state root of the relay chain. This code then appears, in full, in the candidate commitments. These candidate commitments are, in turn, gossiped among all validators so they can be included into the relay chain by the block author, who is most likely not a backer of the parachain doing the code upgrade.
An improvement to this situation would be for the PVF to only output the hash and size of the code, for inclusion in the candidate commitments.
Upon reaching the relay chain, the future code announcement creates a grace period where any user of the relay chain can upload the code using an
UnsignedTransaction
. These uploads are not on the critical path of parachain execution, and parachain code upgrades need to be delayed anyway for other reasons (See paritytech/polkadot#3211). Once the code is actually uploaded to the relay chain, the relay chain is ready for the parachain to upgrade its code and after thecode_upgrade_delay
, as specified in theHostConfiguration
, the code can be upgraded at any time.Solving Code in upgrade parablock PoVs: Move code to the PVF parameters and the
AvailableData
.When a parachain actually triggers its code upgrade, in practice, it involves the PVF moving the new code from one section of the trie to another. Although this is not strictly necessary within the parachain execution model, Cumulus-based parachains store their code in the state trie.
There are two approaches I considered to solve this problem:
:code
less special, or giving a way for:code
to specify some other trie node which actually holds the real code.AvailableData
.The problem with approach 1 is that although we avoid including the code in the PoV at the point of the upgrade, we still have include the code in the PoV in some other block, where the code was moved into the storage of the parachain. This makes it a non-solution, so we'll ignore it and look at approach 2.
The idea of approach 2 is to make 3 alterations to parachain primitives:
With these changes, we continue to make the code available in the erasure-coding of the
AvailableData
that is kept by the entire validator-set, but it no longer needs to be sent explicitly between the collator and the backers or between the backers. Instead, ifapplies_upgrade
istrue
, the backers can draw the code from other sources. At the moment, scheduled validation code is stored on-chain, but even in the future, when validation code is stored off-chain, the backing validators will have it to pass into the PVF.Since the backing pipeline is the critical path, reducing the bandwidth between these actors will have a huge beneficial effect on the performance of the blocks applying runtime upgrades.
It is illegal for the CandidateDescriptor to contain
applies_upgrade == true
if the context it is executed in does not have a scheduled code upgrade for the parachain. Honest backers won't placeSome
intoAvailableData::code
ifapplies_upgrade == true
. The runtime of the relay chain will reject all such candidates, so it's known that every candidate receipt that appears on-chain, pre-availability, correctly indicates whether theAvailableData::code
should containSome
.As an approval checker or a dispute participant, if the
applies_upgrade == false
and theAvailableData::code
isSome
, the candidate is invalid, as well as vice-versa. This means that any malicious backers which have managed to include a falseAvailableData
are slashed, and also that the candidate won't be finalized. This check is safe, because anything that has included has already passed the runtime check in the past. If these checks pass, andAvailableData::code
isSome
, then it should be passed into the PVF during the approval check.Lastly on the core protocol side, the only thing that the parachain storage needs to store is the hash of the upcoming code. When the PVF accepts the new code, it can check that the code passed in hashes to the correct value, and then write it to its state. Writes to state don't affect the PoV size significantly as a general rule, but especially when a trie node with given (
:code
in practice) is already present.Implementing this new PVF for Cumulus nodes poses a small additional challenge because of its requirements:
From these requirements, it's clear that the actual blocks that Cumulus nodes synchronize, store, and execute, need to contain the new code at some point in the chain. So the challenge is to find a way to do this in a way where the PoV never does.
The solution that I propose is to have a special inherent, something like this:
This is what appears in the full block outside of the PVF. However, what appears in the PVF is a slightly modified version
In the initial stages of PVF execution, if this inherent is found, then the PVF must have accepted
Some(validation_code)
as its argument or the inputs are invalid. It can replace the stub inherent with the full version, and this achieves the 3 goals that the produced Cumulus blockchain contains the new code, the PVF produces head data that matches the Cumulus blockchain, and that the PoV doesn't contain any code ever.The text was updated successfully, but these errors were encountered: