Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hold HeadTracker lock until persisting to disk #5084

Merged
merged 2 commits into from
Jan 22, 2024

Conversation

dapplion
Copy link
Collaborator

@dapplion dapplion commented Jan 18, 2024

Issue Addressed

Closes #4773
Implements the fix from #5073

Proposed Changes

When persisting the head tracker in the context of the BeaconChain, hold its read lock until it has been written to disk. Otherwise there's a race condition with the pruning thread that can cause it to persist stale data.

Additional Info

Linking another line of concern for this issue, pruning routine dropping the lock before writing the serialised head_tracker data to disk.

drop(head_tracker_lock);
batch.push(StoreOp::KeyValueOp(
persisted_head.as_kv_store_op(BEACON_CHAIN_DB_KEY),
));
// Persist the new finalized checkpoint as the pruning checkpoint.
batch.push(StoreOp::KeyValueOp(
store.pruning_checkpoint_store_op(new_finalized_checkpoint),
));
store.do_atomically_with_block_and_blobs_cache(batch)?;

As long as pruning does not run twice concurrently the only other possible mutation is the BeaconNode importing a block. We have observed that happening during a unwind sequence, so it's definitely possible. In that case we can end-up with a block that exists in the DB but not in the HeadTracker. This can slightly pollute the DB but not break pruning.

@dapplion dapplion changed the title Fix head tracker drop order on un-ordered shutdown Hold HeadTracker lock until persisting to disk Jan 18, 2024
@michaelsproul
Copy link
Member

There's one smol clippy failure.

Once that's fixed I think we can merge this for v4.6.0

@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. database v4.6.0 ETA Q1 2024 labels Jan 19, 2024

// Hold a lock to head_tracker until it has been persisted to disk. Otherwise there's a race
// condition with the pruning thread which can result in a block present in the head tracker
// but absent in the DB. This inconsistency halts pruning and dramastically increases disk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yesssss for dramastically

paulhauner added a commit that referenced this pull request Jan 19, 2024
Squashed commit of the following:

commit d73a2b3
Author: dapplion <[email protected]>
Date:   Fri Jan 19 09:12:48 2024 +0800

    lint

commit 5b5a304
Author: Michael Sproul <[email protected]>
Date:   Tue Jan 16 16:10:28 2024 +1100

    Fix head tracker drop order on un-ordered shutdown
@paulhauner paulhauner mentioned this pull request Jan 19, 2024
2 tasks
@dapplion dapplion added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Jan 19, 2024
@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jan 22, 2024
@paulhauner paulhauner merged commit 585124f into sigp:unstable Jan 22, 2024
27 of 28 checks passed
@dapplion dapplion deleted the head-tracker-bug-fix branch January 23, 2024 14:18
danielrachi1 pushed a commit to danielrachi1/lighthouse that referenced this pull request Feb 14, 2024
* Fix head tracker drop order on un-ordered shutdown

* lint

---------

Co-authored-by: Michael Sproul <[email protected]>
@dapplion dapplion restored the head-tracker-bug-fix branch March 5, 2024 01:29
@dapplion dapplion mentioned this pull request Jun 10, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database ready-for-merge This PR is ready to merge. v4.6.0 ETA Q1 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants