-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Re-Genesis #7458
Comments
cc @andresilva |
What happens to past era extrinsics / events for the purpose of auditing (tax etc)? Can people still rebuild the previous era with archive nodes? |
@Swader They should always be able to do that. Right now I'm thinking about each era using different networking identifier and storage location for simplicity (that is, if we indeed decided to go towards the Re-Genesis direction), but the UX definitely can be improved. |
The thing I'm wondering is, right now a full node can become an archive node without needing any communication from other nodes, just based on its extrinsics which it keeps no matter the pruning mode. A full node of era 1 will not be able to do that, presumably. Would this potentially cause an availability rift if no one were to be running a full node of era 0 any more? |
@Swader Yeah indeed. But the chance that not a single person runs era 0 full node is quite slim, IMO. |
Agreed, just putting it out there as a there is a chance. I think this functionality is interesting, and I'd like to see it in Substrate. I don't think Polkadot would use this (because of the slight chance of missing past era availability), but I could definitely see Kusama undergo a new era launch every 5 million blocks or so 👍 |
@Swader That is the same problem we will have when we implement warp syncing since nodes will stop downloading the history from before the snapshot point (or at least that was the case with our implementation in parity-ethereum). Normal node operation would still be to sync through all eras and import everything (potentially to different database locations on-disk but that's an implementation detail), so all the data would have the same availability guarantees it has today. The main driving point of this feature is as a potential implementation for swappable consensus, which we'd want to use in the future in Polkadot (e.g. for migrating from BABE to SASSAFRAS).
I think the light client would just have to start syncing from the latest era. I think this is OK since on PoS chains the light clients already cannot be trusted from genesis due to weak subjectivity.
This might make it harder to allow serving clients on all eras, but didn't check what changes would be needed on networking.
I think ideally we'd want to avoid resetting the block numbers and just keep incrementing them across eras. From the client-side this might be doable just by maintaining an offset. For the runtime though not sure if that is enough since we might have state entries referencing block numbers from previous eras. I think we might need to remove the assumption that the genesis block is #0, and instead pickup the block number from the last era. |
Ultimately the networking should be capable of "connecting" to multiple different chains (#3310), in other words to support multiple different chains/eras at the same time, provided each chain/era has a different If however we don't reset the block number to 0, there's no change required on the networking. |
Is the name "era" intentionally similar to staking eras? If not I would suggest different naming to avoid confusion. |
An eon is a unit that's bigger than era and is composed of eras, so that sounds appropriate. |
We're in the same place now with HydraDX. We've selected default epoch length from the Substrate repo of 10 minutes not realizing that it can have implications for network stability (i.e. no blocks for 10 minutes means stalling network) and also UX for validators - getting kicked out from the set and losing nominations if offline for 10 mins. As epoch length cannot be changed after the fact the chain started, we're now either forced to restart from #0 with old state, or risk the stalling for now, prepare for this migration and restart after the fact. The UX now however is not ideal as this is looking like a simple property change in the first place, but we're forced to upgrade all 200+ waiting validators, +even more nodes, make sure to purge their state and either wait for them to re-indicate validation/nomination by purging the validator set state from storage, or risk starting the chain and believe that they have done everything right on time. Also going back to 0 doesn't look good from UX standpoint, since we're indeed continuing the chain. |
FWIW CENNZnet is in the same boat. We setup a system to move session keys to hot stand by nodes incase a validator is detected restarting or stalled etc. with changes like this it seems possible to increase epoch duration, maybe some one off hack like setting a specific epoch will be required: #8072 |
That is actually very good to hear that it's working for you, and there is a light at the end of the tunnel. I guess we could try to live with it at least during the first part of the incentivized testnet. We've already postponed slashing during this phase to 27 days and plan to revert slashes automatically, so I guess we'll have larger validator turnout since they'll need to get re-elected often, but that's actually not bad for testing phase. |
So we've come quite far with our re-genesis galacticcouncil/hydration-node#191 but are now stuck at a chicken and egg problem here. polkadot-js/extension#687 (comment) TLDR; We either need to stop the chain until extension is updated and then re-start (still could have problems), or deal with two separate instances of one chain which is kind of PITA since we already have quite a lot of users. Anybody has any better idea how to tackle this problem? |
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions. |
Issue still relevant and important. |
Looking forward to this. |
Has anyone try this concept out yet? We face issue that Substrate era is not ended. |
You could take a look at https://github.com/darwinia-network/fork-off-substrate |
I tried out the |
Thanks @AurevoirXavier, we tried once but didn't work. Thanks so much for the help anyway. |
Weird, our state is more than 1g. |
It works for you? In our case, we suspect that it could be not hardware specs are a bit low and we only have around 20 nodes + 13 validator. But still figuring out the root causes to prevent the next issue. |
This documents some of notes and designs of a Re-Genesis process. Re-Genesis is basically the process of exporting the current chain state, and create a new chain building on it.
Rationale
The discussions started as an alternative method to Swappable Consensus (#1304). Many consensus engines we have right now (like BABE) make assumptions about the chain state, block numbers, among other things, so a direct consensus swapping will require some heavy modification of the consensus engines themselves. In addition, custom migration code must be written individually for each possible swapping.
Re-Genesis, on the contrary, is much simpler. If implemented with care, it can accomplish the same thing as Swappable Consensus. We do not need to modify existing consensus engines to remove their assumptions, but just need to make switching and restarting a runtime plus consensus engine combination fast.
Re-Genesis can also be used for other purposes that Swappable Consensus is not able to cover:
Design
Choosing the Re-Genesis block
A Re-Genesis process divide a blockchain into eras. If a blockchain is considered in era N prior to Re-Genesis, it becomes in era N + 1 post Re-Genesis. At each era, the block number starts from 0. So we can refer to blocks as "era N block M".
The first question is how we choose the Re-Genesis block.
We can always choose the head block at a particular height, but that would not be reliable. There can be multiple such blocks at the same time, and if the state rebuilding process is heavy, allowing it to be switched around is an attack vector.
Instead, we define the Re-Genesis block as a finalized block at a particular height (for chains with finalization), or a block at a particular height with siblings of depths at least D (for chains with probabilistic finalization). This means that when switching from era N to era N + 1, upon the Re-Genesis block, the old era N chain will continue to build blocks and states, but those built blocks and states will not be accounted for in the new era. Instead, they're only there to make the possibility of having multiple Re-Genesis blocks low.
Stopping the old era chain
Having the old era N chain continuing to build blocks and states is definitely not ideal. So we can work on additional support for the runtime to stop the old era chain. The chain stopping process consists of two steps:
setCode
command with an empty code, to permanently shut down the code chain.Starting the new era chain
Substrate users define their own migration script. The migration will obviously define the initial parameters of the new consensus engine. For the rest of the states, Substrate users can cherry-pick what they want and discard others -- either taking the full state over, or just take the balances and other essential things.
After migration, this new state is then set as the genesis block state for era N + 1, and a new chain continues to function beyond this point.
We note that the difference of a Re-Genesis process and a complete new blockchain, is that the genesis state for a Re-Genesis process is not known until the Re-Genesis block is identified.
Discussions
Light client
Light client implementations differ by consensus engines. As a result, no matter using Swappable Consensus or Re-Genesis, they may not work accross the border. Substrate users may have to ask node users to manually switch light clients, upon Re-Genesis.
Missed time
During the Re-Genesis process, we note there's a stop-the-world migration. Even if that is fast, to identify the Re-Genesis block, time has to be spent on the old era chain to finalize the Re-Genesis block. This will result in a period of time when no actual blocks with state is building for the blockchain.
UX issues
Re-Genesis introduces a new concept called "era", and compared with Swappable Consensus, the new era's block starts their block numbers from 0 again. This can be an UX issue that we should take care of.
Prior usages
The only real-world usage right now (relying on an ad-hoc Re-Genesis process) was Kulupu's era switch at era 0 block 320,000. The process was almost like above, but everything was done manually (with a new node released after Re-Genesis block).
Edgeware also considered Re-Genesis for its first runtime upgrade, but decided against it due to UX concerns.
The text was updated successfully, but these errors were encountered: