Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change(state): Write finalized blocks to the state in a separate thread, to avoid network and RPC hangs #5134

Merged
merged 47 commits into from
Sep 28, 2022

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Sep 12, 2022

Motivation

This PR starts moving state block write requests into their own task.
When it is finished, it will make state block writes fully concurrent.

Part of #4937.
Closes #5125.

Designs

See #4937 and https://docs.google.com/drawings/d/1FXpAUlenDAjl8nkftrypdAPsj0jr-Ut9gZlSP57nuyc/edit

Solution

Add a block write task:

  • Commit finalized blocks to the state in a separate task
  • Add last_block_hash_sent to the state service, to avoid database accesses when draining the queue
  • Add task handles to check for task exits and panics

Add block write channels:

  • Finalized state
  • Non-finalized state (not used in this PR)
  • Close the finalized block channel when we're finished with it
  • Remove a redundant block_in_place()

Deal with errors & shutdowns:

  • Update last_block_hash_sent regardless of commit errors
  • Check for panics in the block write task
  • Close state channels and tasks on drop
  • Drop channels and check for closed channels in the block commit task
  • Work around a RocksDB shutdown bug in the tests, by removing previous code that cancelled RocksDB background work on shutdown
  • Drop any leftover queued blocks when finished with the finalized state, and return an error on their channels
  • Return an error on the channels of replaced duplicate blocks

Related fixes:

  • Wait for the block commit task in tests, and check for errors
  • Wait for chain tip updates in the RPC tests
  • Improve RPC error logging
  • When running a proptest that sleeps a lot, only run it once
  • Explain finalized block duplicate handling and consensus rules

Related cleanups:

  • Remove some duplicate fields across StateService and ReadStateService
  • Rename a field to StateService.max_queued_finalized_height

Testing

We should run a full sync on this before we merge, and make sure it is not significantly slower than the main branch.

Review

This PR is part of regular scheduled work.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

Handle non-finalized blocks.
Handle task exits and panics correctly.
Testing.

@teor2345 teor2345 added C-enhancement Category: This is an improvement P-Medium ⚡ I-slow Problems with performance or responsiveness A-state Area: State / database changes labels Sep 12, 2022
@teor2345 teor2345 self-assigned this Sep 12, 2022
@teor2345 teor2345 changed the title change(state): Add a block write task and channels 3. change(state): Add a block write task and channels Sep 12, 2022
@teor2345 teor2345 force-pushed the state-commit-task branch 2 times, most recently from 1cd837d to 2a044fc Compare September 13, 2022 18:46
@teor2345 teor2345 changed the base branch from state-commit-task to move-finalized-queue September 13, 2022 19:47
@teor2345 teor2345 changed the title 3. change(state): Add a block write task and channels 4. change(state): Add a block write task and channels Sep 13, 2022
@teor2345 teor2345 force-pushed the block-commit-channel branch from 6ba58f8 to e7b070d Compare September 14, 2022 04:41
@teor2345 teor2345 force-pushed the move-finalized-queue branch from 0082a2b to 9fe233a Compare September 14, 2022 04:43
@teor2345 teor2345 force-pushed the block-commit-channel branch from e7b070d to 5e7943a Compare September 14, 2022 04:43
@teor2345

This comment was marked as outdated.

@teor2345

This comment was marked as resolved.

@teor2345

This comment was marked as resolved.

@teor2345 teor2345 changed the title 4. change(state): Add a block write task and channels 4. change(state): Write finalized blocks to the state in a separate thread Sep 14, 2022
@teor2345 teor2345 force-pushed the move-finalized-queue branch from 9fe233a to 56e36f4 Compare September 15, 2022 00:41
@teor2345 teor2345 force-pushed the block-commit-channel branch 2 times, most recently from a7c50ad to fc476f3 Compare September 15, 2022 04:37
@teor2345

This comment was marked as resolved.

@teor2345

This comment was marked as resolved.

@teor2345 teor2345 marked this pull request as ready for review September 16, 2022 03:33
@teor2345 teor2345 requested a review from a team as a code owner September 16, 2022 03:33
@teor2345 teor2345 removed the request for review from a team September 16, 2022 03:33
@teor2345 teor2345 force-pushed the block-commit-channel branch from 5544cfa to 96d3966 Compare September 27, 2022 03:13
@teor2345
Copy link
Contributor Author

I'd also like to remove Clone from NonFinalizedState, because it doesn't work the same way as FinalizedState.clone().

That's not possible, so I just updated the docs.

arya2
arya2 previously approved these changes Sep 27, 2022
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent.

upbqdn
upbqdn previously approved these changes Sep 27, 2022
Copy link
Member

@upbqdn upbqdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left one minor q.

zebra-state/src/service/finalized_state.rs Show resolved Hide resolved
@teor2345 teor2345 dismissed stale reviews from upbqdn and arya2 via b150e3f September 27, 2022 23:42
@teor2345 teor2345 requested a review from upbqdn September 27, 2022 23:42
@github-actions github-actions bot added the C-enhancement Category: This is an improvement label Sep 27, 2022
@teor2345 teor2345 requested a review from arya2 September 27, 2022 23:42
@teor2345
Copy link
Contributor Author

One last fix to the block height check in b150e3f, and I think we're done.

mergify bot added a commit that referenced this pull request Sep 28, 2022
@mergify mergify bot merged commit 343c5e6 into main Sep 28, 2022
@mergify mergify bot deleted the block-commit-channel branch September 28, 2022 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-state Area: State / database changes C-bug Category: This is a bug C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid temporary failures verifying the first non-finalized block
3 participants