-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slot based collations #3168
Comments
This is exactly the same problem of the relay chain where we use client back off vs the proper solution of using finality proofs and letting the runtime decide when to back off. Do we want for the short term to implement something similar or we just go for the proper solution in cumulus ?
This makes a lot of sense to do even more since advertising the collations is not tied to block production. While there is block authorship consensus , it is not clear if only specific or all collators should advertise collations?
Also, the slow down is mandatory to avoid OOM.
Could collators prefer to use relay parents that do not have any siblings rather than latest head ? This should help asuming the forks are not long, but we need some more lenient parameters around the depth of the relay parent. |
I mean this isn't entirely related. The parachain doesn't "care" about finality. The problem is also actually already solved in the parachain runtime: polkadot-sdk/cumulus/pallets/parachain-system/src/lib.rs Lines 1335 to 1339 in 2b2d406
However, the problem here is also different to the one at the relay chain. Here, if a malicious node would continue to produce blocks, they wouldn't get anything out of it. Because the relay chain would reject these parachain blocks if the context is too old. So, a malicious node would not really start getting a bigger share in the block production.
Yeah a good point. I actually thought about this. I think for the beginning, we should just let the author send the collation to the relay chain. However, you could also "offload" this process to some random node in the network (depending on your needs and whatever).
Not sure what should OOM. Yes, maybe the thing that caches the storage proofs, but any proper solution would work with a bounded cache any way.
I mean there is no guarantee that there doesn't exist a sibling that you maybe not yet have seen yet. Picking for example the block with the primary BABE slot would also a way to ensure you are on the best chain. But yeah, multiple ways are possible here. Someone should do some calculations on the average fork length and then setup the lenient parameters around this average fork length to ensure parachains do not run into the fork problem. |
Cumulus test-parachain node and test runtime were still using relay chain consensus and 12s blocktimes. With async backing around the corner on the major chains we should switch our tests too. Also needed to nicely test the changes coming to collators in #3168. ### Changes Overview - Followed the [migration guide](https://wiki.polkadot.network/docs/maintain-guides-async-backing) for async backing for the cumulus-test-runtime - Adjusted the cumulus-test-service to use the correct import-queue, lookahead collator etc. - The block validation function now uses the Aura Ext Executor so that the seal of the block is validated - Previous point requires that we seal block before calling into `validate_block`, I introduced a helper function for that - Test client adjusted to provide a slot to the relay chain proof and the aura pre-digest
Part of #3168 On top of #3568 ### Changes Overview - Introduces a new collator variant in `cumulus/client/consensus/aura/src/collators/slot_based/mod.rs` - Two tasks are part of that module, one for block building and one for collation building and submission. - Introduces a new variant of `cumulus-test-runtime` which has 2s slot duration, used for zombienet testing - Zombienet tests for the new collator **Note:** This collator is considered experimental and should only be used for testing and exploration for now. ### Comparison with `lookahead` collator - The new variant is slot based, meaning it waits for the next slot of the parachain, then starts authoring - The search for potential parents remains mostly unchanged from lookahead - As anchor, we use the current best relay parent - In general, the new collator tends to be anchored to one relay parent earlier. `lookahead` generally waits for a new relay block to arrive before it attempts to build a block. This means the actual timing of parachain blocks depends on when the relay block has been authored and imported. With the slot-triggered approach we are authoring directly on the slot boundary, were a new relay chain block has probably not yet arrived. ### Limitations - Overall, the current implementation focuses on the "happy path" - We assume that we want to collate close to the tip of the relay chain. It would be useful however to have some kind of configurable drift, so that we could lag behind a bit. #3965 - The collation task is pretty dumb currently. It checks if we have cores scheduled and if yes, submits all the messages we have received from the block builder until we have something submitted for every core. Ideally we should do some extra checks, i.e. we do not need to submit if the built block is already too old (build on a out of range relay parent) or was authored with a relay parent that is not an ancestor of the relay block we are submitting at. #3966 - There is no throttling, we assume that we can submit _velocity_ blocks every relay chain block. There should be communication between the collator task and block-builder task. - The parent search and ConsensusHook are not yet properly adjusted. The parent search makes assumptions about the pending candidate which no longer hold. #3967 - Custom triggers for block building not implemented. --------- Co-authored-by: Davide Galassi <[email protected]> Co-authored-by: Andrei Sandu <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: Javier Viola <[email protected]> Co-authored-by: command-bot <>
Part of paritytech#3168 On top of paritytech#3568 ### Changes Overview - Introduces a new collator variant in `cumulus/client/consensus/aura/src/collators/slot_based/mod.rs` - Two tasks are part of that module, one for block building and one for collation building and submission. - Introduces a new variant of `cumulus-test-runtime` which has 2s slot duration, used for zombienet testing - Zombienet tests for the new collator **Note:** This collator is considered experimental and should only be used for testing and exploration for now. ### Comparison with `lookahead` collator - The new variant is slot based, meaning it waits for the next slot of the parachain, then starts authoring - The search for potential parents remains mostly unchanged from lookahead - As anchor, we use the current best relay parent - In general, the new collator tends to be anchored to one relay parent earlier. `lookahead` generally waits for a new relay block to arrive before it attempts to build a block. This means the actual timing of parachain blocks depends on when the relay block has been authored and imported. With the slot-triggered approach we are authoring directly on the slot boundary, were a new relay chain block has probably not yet arrived. ### Limitations - Overall, the current implementation focuses on the "happy path" - We assume that we want to collate close to the tip of the relay chain. It would be useful however to have some kind of configurable drift, so that we could lag behind a bit. paritytech#3965 - The collation task is pretty dumb currently. It checks if we have cores scheduled and if yes, submits all the messages we have received from the block builder until we have something submitted for every core. Ideally we should do some extra checks, i.e. we do not need to submit if the built block is already too old (build on a out of range relay parent) or was authored with a relay parent that is not an ancestor of the relay block we are submitting at. paritytech#3966 - There is no throttling, we assume that we can submit _velocity_ blocks every relay chain block. There should be communication between the collator task and block-builder task. - The parent search and ConsensusHook are not yet properly adjusted. The parent search makes assumptions about the pending candidate which no longer hold. paritytech#3967 - Custom triggers for block building not implemented. --------- Co-authored-by: Davide Galassi <[email protected]> Co-authored-by: Andrei Sandu <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: Javier Viola <[email protected]> Co-authored-by: command-bot <>
Part of paritytech#3168 On top of paritytech#3568 ### Changes Overview - Introduces a new collator variant in `cumulus/client/consensus/aura/src/collators/slot_based/mod.rs` - Two tasks are part of that module, one for block building and one for collation building and submission. - Introduces a new variant of `cumulus-test-runtime` which has 2s slot duration, used for zombienet testing - Zombienet tests for the new collator **Note:** This collator is considered experimental and should only be used for testing and exploration for now. ### Comparison with `lookahead` collator - The new variant is slot based, meaning it waits for the next slot of the parachain, then starts authoring - The search for potential parents remains mostly unchanged from lookahead - As anchor, we use the current best relay parent - In general, the new collator tends to be anchored to one relay parent earlier. `lookahead` generally waits for a new relay block to arrive before it attempts to build a block. This means the actual timing of parachain blocks depends on when the relay block has been authored and imported. With the slot-triggered approach we are authoring directly on the slot boundary, were a new relay chain block has probably not yet arrived. ### Limitations - Overall, the current implementation focuses on the "happy path" - We assume that we want to collate close to the tip of the relay chain. It would be useful however to have some kind of configurable drift, so that we could lag behind a bit. paritytech#3965 - The collation task is pretty dumb currently. It checks if we have cores scheduled and if yes, submits all the messages we have received from the block builder until we have something submitted for every core. Ideally we should do some extra checks, i.e. we do not need to submit if the built block is already too old (build on a out of range relay parent) or was authored with a relay parent that is not an ancestor of the relay block we are submitting at. paritytech#3966 - There is no throttling, we assume that we can submit _velocity_ blocks every relay chain block. There should be communication between the collator task and block-builder task. - The parent search and ConsensusHook are not yet properly adjusted. The parent search makes assumptions about the pending candidate which no longer hold. paritytech#3967 - Custom triggers for block building not implemented. --------- Co-authored-by: Davide Galassi <[email protected]> Co-authored-by: Andrei Sandu <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: Javier Viola <[email protected]> Co-authored-by: command-bot <>
The current collator implementations depend on the relay chain block import as "clock". This means for every imported relay chain block, they will check the relay chain state if the parachain is allowed to build a block and then do so. However, the relay chain for example has a task that fires every 6 seconds and then builds a block. For sure parachains still need to include their blocks still in the relay chain, but with async backing parachains get more freedom when it comes to block production. So, we should split up the current collator implementation into two tasks:
The block production would run the same way as it is done on the relay chain. This means we have a fixed slot and this fire every X seconds to build a block. This mechanism needs to be implemented in a flexible way to support the following cases:
This logic for block production should be implemented in a fairly generic way. Meaning that the actual block production is hidden behind some generic type as the stuff here is actually more like the trigger to build a block. The actual block production is then something like Aura that gets the key/builds/signs the block and returns it. Then we almost have like a normal chain that has its block production separated. However, we should consider to slow down the block production if our parachain is running too far in front of what is already enacted in the relay chain. This means that the collation tasks tells the block production to slow down. As each parachain block is build on a certain relay chain block that provides some context, we would use the best block of the relay chain as of starting the block production.
The collation task would work as the current collator task is working. This means it listens for relay chain block imports, checks if the parachain has a slot and if yes, creates the collation. After #3167 is finished, it would also be possible to build the collation early enough to send it on time to the relay chain. We could probably think about certain kind of optimizations, like keeping the storage proof from an imported/built block around to not require to re-run the block for creating the collation.
Parachain blocks are being build in the context of certain relay chain and the relay chain allows with async backing that this context block is lacking behind to the point of view of validating the parachain block. However, the difference between the context and the point of validating have a limit. Collation and block production need to ensure that the blocks are staying valid or we may need to build a new block with a new context. This is basically the main reason for slowing down the block production that was already mentioned above.
Forks on the relay chain also need to be considered. A parachain block can not be validated if the context is a different fork. There are forks with BABE on the relay chain, but they are not that long. One simple way to improve the situation is that parachain blocks are always build on at least one relay chain block before the relay chain block they will be validated for.
Another thing we should think about is to have some kind of slot offset. Let's assume we have a parachain running at 6 seconds block time. We want to ensure that we are able to include a block every 6 seconds on the relay chain. We could for example run with a slot offset of 2 seconds to always have the block produced before the relay chain block. However, maybe that doesn't make that much sense and we just always run behind in the relay chain block context. This should probably achieve the same. This means we build on context X, let it validate on X + 1 and include in X + 2.
Tasks
The text was updated successfully, but these errors were encountered: