-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change(state): Write non-finalized blocks to the state in a separate thread, to avoid network and RPC hangs #5257
Conversation
I'm still looking into why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, it's a good design, and it seems like it will work.
I've marked a few reviews as a "Required Fix". They are probably the bugs causing the test failures. It might help to focus on them first.
I also made some comments about refactors that would simplify the code or improve cleanup. But they aren't as important.
Feel free to ignore the nitpicks until the sync and tests are working, some of them might get deleted or moved.
Let me know if you need help with any of these changes, or if you'd like me to push a PR for any change (or all of them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just marking this so people know it has been reviewed.)
I compared the full sync before my sync tweaks: With the full sync after the tweaks: The sync after is about 25 minutes faster, and the jobs for recent blocks are faster than they used to be. So I think I'll keep most of the changes. (They should also reduce RAM usage.) |
* Remove some verbose block write channel logs * Only warn about tracing endpoint if the address is actually set * Use CloneError instead of formatting a non-cloneable error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
acd61c5
to
74b29be
Compare
I'm running a full sync with the new checkpoints here: If any of the sync jobs time out, we can fix the workflows, but that doesn't need to block this PR merging. I'm also running a full mainnet and testnet sync locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is complete here, and we can do any other fixes in separate PRs.
Failed to build |
@Mergifyio update |
✅ Branch has been successfully updated |
Motivation
This PR sends queued non-finalized blocks down a channel to the writer task, which makes state block writes concurrent
Part of #4937.
Designs
Solution
Use block write task to:
MAX_REORG_HEIGHT
.Related cleanups:
chain_tip_sender
andnon_finalized_state_sender
fields of StateService.Review
This PR is part of regular scheduled work.
Reviewer Checklist
Follow Up Work
db.utxo(&outpoint).map(|utxo| utxo.utxo)
instead of routing it to the ReadRequest after checking the queue and sent hashes on StateService.