Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change(state): Write non-finalized blocks to the state in a separate thread, to avoid network and RPC hangs #5257

Merged
merged 120 commits into from
Oct 11, 2022

Conversation

arya2
Copy link
Contributor

@arya2 arya2 commented Sep 26, 2022

Motivation

This PR sends queued non-finalized blocks down a channel to the writer task, which makes state block writes concurrent

Part of #4937.

Designs

Solution

Use block write task to:

  • Commit non-finalized blocks.
  • Propagate errors from queued parent blocks to queued children and skip validate_and_commit calls.
  • Finalize blocks on best chain past MAX_REORG_HEIGHT.

Related cleanups:

  • Removes unused chain_tip_sender and non_finalized_state_sender fields of StateService.

Review

This PR is part of regular scheduled work.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
  • How do you know it works? Does it have tests?

Follow Up Work

  • Avoid removing and re-adding known_utxos by nesting the SentHashes and QueuedBlocks inside another struct that holds that tracks the known_utxos
  • Avoid now-redundant checks against the chains in the Request::AwaitUtxo call by skipping to db.utxo(&outpoint).map(|utxo| utxo.utxo) instead of routing it to the ReadRequest after checking the queue and sent hashes on StateService.
  • Organize block write task into an object.

@arya2 arya2 requested review from teor2345 and upbqdn September 26, 2022 18:05
@arya2 arya2 added C-bug Category: This is a bug P-Medium ⚡ I-slow Problems with performance or responsiveness A-state Area: State / database changes labels Sep 26, 2022
@arya2
Copy link
Contributor Author

arya2 commented Sep 26, 2022

I'm still looking into why value_pool_is_updated fails and chain_tip_sender_is_updated hangs

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, it's a good design, and it seems like it will work.

I've marked a few reviews as a "Required Fix". They are probably the bugs causing the test failures. It might help to focus on them first.

I also made some comments about refactors that would simplify the code or improve cleanup. But they aren't as important.

Feel free to ignore the nitpicks until the sync and tests are working, some of them might get deleted or moved.

Let me know if you need help with any of these changes, or if you'd like me to push a PR for any change (or all of them).

zebra-state/src/service/write.rs Outdated Show resolved Hide resolved
zebra-state/src/service/write.rs Outdated Show resolved Hide resolved
zebra-state/src/service/write.rs Outdated Show resolved Hide resolved
zebra-state/src/service/write.rs Outdated Show resolved Hide resolved
zebra-state/src/service/write.rs Outdated Show resolved Hide resolved
zebra-state/src/service.rs Outdated Show resolved Hide resolved
zebra-state/src/service.rs Outdated Show resolved Hide resolved
zebra-state/src/service.rs Outdated Show resolved Hide resolved
zebra-state/src/service.rs Outdated Show resolved Hide resolved
zebra-state/src/service.rs Show resolved Hide resolved
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just marking this so people know it has been reviewed.)

@teor2345
Copy link
Contributor

I compared the full sync before my sync tweaks:
https://github.com/ZcashFoundation/zebra/actions/runs/3190092327

With the full sync after the tweaks:
https://github.com/ZcashFoundation/zebra/actions/runs/3190092327

The sync after is about 25 minutes faster, and the jobs for recent blocks are faster than they used to be. So I think I'll keep most of the changes. (They should also reduce RAM usage.)

@teor2345 teor2345 force-pushed the non-finalized-block-commit-channel branch from acd61c5 to 74b29be Compare October 11, 2022 01:05
@teor2345
Copy link
Contributor

I added a workaround for #5125 in commit 74b29be.

That should be it for this PR, we just need to merge the checkpoint PR and get the full sync working.

@teor2345
Copy link
Contributor

teor2345 commented Oct 11, 2022

I'm running a full sync with the new checkpoints here:
https://github.com/ZcashFoundation/zebra/actions/runs/3223410431

If any of the sync jobs time out, we can fix the workflows, but that doesn't need to block this PR merging.
(The timeouts could be from PR #5360 or this PR, but we want to wait to fix them until we have the final times from both.)

I'm also running a full mainnet and testnet sync locally.

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is complete here, and we can do any other fixes in separate PRs.

zebrad/src/components/sync/downloads.rs Show resolved Hide resolved
mergify bot added a commit that referenced this pull request Oct 11, 2022
@arya2
Copy link
Contributor Author

arya2 commented Oct 11, 2022

Failed to build zebra-test in the merge queue: https://github.com/ZcashFoundation/zebra/actions/runs/3228683588/jobs/5285115640

@arya2
Copy link
Contributor Author

arya2 commented Oct 11, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Oct 11, 2022

update

✅ Branch has been successfully updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-state Area: State / database changes C-bug Category: This is a bug C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants