-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase para block inclusion reliability #5544
Comments
This is a good solution if at some point we discover that 1,5 seconds is not enough time for availability. I'd expect 10MB PoVs could add some pressure here. Running subsystem benchmark numbers with some realistic latencies should give a hint.
Finality can slow down and then this strategy doesn't work.
I'd expect
I think this is what we were doing until slot based collator. Beefier (more cores) collators should make this solution a lower hanging fruit.
We are planning to use the same value for the max ancestry and claim queue length. I don't really see a point in allowing RPs survive longer. If we do that, why not also have the same scheduling look ahead ?
💯
I think this will have most impact on block times in general. |
Another one that makes sense to have on this list and a lower hanging fruit: Currently for availability we actually have more time, but we are starting the bitfield singing task and timer as soon as we import a relay chain block. If we imported that block very early we have more than 1.5s to fetch chunks and also the |
If the underlying validator set doesn't change, we should completely stop invalidating candidates on a session change. Or is there any proper reason?
|
Mostly implementation complexity. @rphmeier back then decided, that it is not worth it for now. Worth checking again though, things have changed a lot. |
My reasoning at the time was that session changes affect only a tiny proportion of blocks. Session changes happen only once every several hours and take thousands of blocks. So we'd be chasing like 0.1% efficiency.
worth noting that collation is bottlenecked on IOPS, not CPU, so building on all forks might work until parachains actually are under load and then stop working altogether. maybe things have changed, but AFAIK slow availability shouldn't cause a parachain block to get retracted. it should just become available more slowly. is the 1 minute availability timeout still a thing? |
The issue is, that it delays follow up blocks, up until the point where their relay parent might went out of scope. (Fixable by being more lenient with accepted relay parents) |
Having produced parachain blocks retracted is the very least detrimental to throughput of the chain, but also harms user and developer experience:
assumption that we trust collators.
higher level of security.
In general the block will move up in assurance over time, but if it happens
frequently that a block after all just gets discarded, the benefit this property
vanishes and one actually has to wait for definite finality, which takes the
longest.
The following is a kind of unordered list of things that can cause a parachain block to not make it + solutions to it.
Speculative Availability
Give availability more time, to enhance likelihood of cores getting freed on
time:
cores and then backable candidates from prospective parachains
Immunity to relay chain forks
Either:
Build on slightly older relay parents
finalized already.
Build on all forks
single core, hence additional cores are free)
Avoid relay parents becoming obsolete
Allow relay parents that survive longer than the claim queue length. Then the
runtime would still accept those candidates if in the current claim queue the
parachain still has assignments on the core. This way, if e.g. a block producer does not
produce a block, the parachain would merely slow down a bit, but not get its
blocks discarded.
Update: We will punt on this for now.
Session boundaries
Even with above optimizations, session boundaries would still make relay parents
obsolete. A simple fix would be for collators to anticipate the session change
and stop producing candidates that would end up getting backed in the last block of the session.
Core Changes
With above "Avoid relay parents becoming obsolete", this would not work if the
parachain still has a core assigned, but it is different now. This is not easy
to fix in the current design, luckily it should also have very little impact:
mappings stable.
at that rare occasion you changed your core mapping. Given that we have
pretty solid 6s block times, the chances for this happening seem acceptable.
Reliable Collator Protocol
We want to make validator - collator connections as reliable as possible to
ensure produced blocks also end up getting validated in a timely manner.
The text was updated successfully, but these errors were encountered: