-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validators entering the active set are slow on validation because PVF artifacts are not compiled #4324
Comments
Good idea, nodes should be able to know that already by checking the next session keys. |
I had another idea lost somewhere in discussions: instead of invalidating the cache after 24h, make it size-bounded and only remove the stalest artifact if the cache size overflows. Not 100% sure but sounds like it's somewhat easier to implement than the lookahead compilation. |
That would work for nodes that enter and live the active set, however it won't work for the situation where the node is simply a fresh node that just joins the active set. |
Fair enough, but that's just a single no-show. Running a validator from scratch is not something that happens very often, I believe. |
Yes, implementing both makes sense. |
Part of #4324 We don't change but extend the existing cleanup strategy. - We still don't touch artifacts being stale less than 24h - First time we attempt pruning only when we hit cache limit (10 GB) - If somehow happened that after we hit 10 GB and least used artifact is stale less than 24h we don't remove it. --------- Co-authored-by: s0me0ne-unkn0wn <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>
Closes #4324 - On every active leaf candidate-validation subsystem checks if the node is the next session authority. - If it is, it fetches backed candidates and prepares unknown PVFs. - We limit number of PVFs per block to not overload subsystem.
Part of paritytech#4324 We don't change but extend the existing cleanup strategy. - We still don't touch artifacts being stale less than 24h - First time we attempt pruning only when we hit cache limit (10 GB) - If somehow happened that after we hit 10 GB and least used artifact is stale less than 24h we don't remove it. --------- Co-authored-by: s0me0ne-unkn0wn <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>
) Closes paritytech#4324 - On every active leaf candidate-validation subsystem checks if the node is the next session authority. - If it is, it fetches backed candidates and prepares unknown PVFs. - We limit number of PVFs per block to not overload subsystem.
Closes #4324 - On every active leaf candidate-validation subsystem checks if the node is the next session authority. - If it is, it fetches backed candidates and prepares unknown PVFs. - We limit number of PVFs per block to not overload subsystem.
PVF artifacts are cleanup every 24h if unused:
polkadot-sdk/polkadot/node/core/pvf/src/host.rs
Line 296 in f34d8e3
So, when nodes join the active set for the first time or they have been out of the active set for 24 hours, they won't the PVF artifacts ready for approving or backing blocks so they will have to compile all PVFs the first time they need to execute a block from each parachain.
Each PVF compilation will take around 3 seconds or more and compiling all polkadot PVFs will take around 3 minutes, hence why the validator will cause no-shows on the approval-voting and will probably fail to get some of the backing points if the PVF of the parachain is not compiled yet.
This is a transient problems since all the PVF should be compiled after around 3-4 minutes, however this problem could probably be relatively easy to fix by proactively compiling the PVFs before the node become actives.
The text was updated successfully, but these errors were encountered: