-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking: Commit blocks to state using a separate task #4937
Comments
I've run through all this height range, and I can't reproduce this bug locally. On my machine, all blocks commit in under 10 minutes. I'm guessing that it's caused by running out of disk space, and PR #4945 will fix it. |
I'm still seeing some of these warnings after increasing the disk size to 200 GB, so that wasn't the complete fix:
https://github.com/ZcashFoundation/zebra/runs/8016091142?check_suite_focus=true#step:6:888 |
We might want to try the performance recommendations here: Or: But we need to check what's slow after merging PR #4952. |
It appears that committing blocks hangs because we sometimes commit over 1000 blocks at the same time, then return the result to the state caller (checkpointer or non-finalized state block committer):
https://github.com/ZcashFoundation/zebra/runs/8147047219?check_suite_focus=true#step:6:165 I'll open some new tickets on Monday with alternative fixes, and add them to a list in this ticket. |
I got these when I was syncing beta 14:
And these when I was syncing Zebra with #4721:
EDIT: all of these entries occurred after the 96th percent of a synced chain. I just searched for the string |
I also got these for beta 14:
And these for #4721:
The heights seem unrelated, so the problem is likely to be in Zebra itself. |
Waiting for the last finalized block is currently handled by zebra/zebra-state/src/service.rs Lines 378 to 381 in 093d503
But it has a bug: if the first non-finalized block arrives before the last finalized block, it will time out and fail verification, because the fork point is only checked once per block. But it will verify correctly when it gets retried. |
We haven't fixed the bug in this ticket yet. |
removing epic from sprint, individual issues in epic should be added instead |
@mpguerra I think we can close this now, there's only one PR left. |
Do we want to do anything else here? I have converted to a tracking issue as all of the issues added to the epic were closed and I have removed from the release candidate epic. |
We've achieved the goals of this ticket within the release candidate scope. |
…to state using a separate task ZcashFoundation#4937), added HeightDiff and height ops fixed, several read requests forwarded to ReadStateService
…to state using a separate task ZcashFoundation#4937), added HeightDiff and height ops fixed, several read requests forwarded to ReadStateService
…to state using a separate task ZcashFoundation#4937), added HeightDiff and height ops fixed, several read requests forwarded to ReadStateService
Motivation
Zebra takes 10-15 minutes to commit some blocks to the state while checkpointing, around blocks 1,718,00 to 1,772,000.
The slow blocks are different on different runs.
This is unacceptable performance, because:
zcashd
Diagnosis
Zebra queues up to 1200 blocks, then commits them all in the same state request, after the missing block arrives. This can take up to 10 seconds per block.
Design
Add a block commit task to the state, which runs in a separate thread. The task should be between the block queue and the block verifier.
We'll need to move the shared mutable chain state into the block commit task, so we will also need to redirect
StateService
read requests to the concurrentReadStateService
.Here is a diagram of the new state design:
https://docs.google.com/drawings/d/1FXpAUlenDAjl8nkftrypdAPsj0jr-Ut9gZlSP57nuyc/edit
Implementation Plan
Stop Accessing Mutable Chain State
Set Up Channels
Setup Block Commit task
Add channels to send blocks to the task
CommitFinalizedBlock
requestsCommitBlock
requestsError Handling & Testing
Optional tasks:
Optional Cleanup Tasks
Bug fixes:
Refactors:
ReadRequest::ChainUtxo
inAwaitUtxo
Renames & Formatting:
*
ortransparent_*
toaddress_*
Request
andResponse
enums in a consistent orderIn Scope
Out of Scope
We don't think we'll need to make these changes as part of this change:
(this reduces the number of state requests from the syncer)
These are definitely out of scope:
The text was updated successfully, but these errors were encountered: