Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Fix bug in database pruning #1564

Closed
wants to merge 2 commits into from

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #1488

Proposed Changes

  • Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain.
  • Add debug logging to the pruning algorithm so we have so better chance of debugging future issues from logs.
  • Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell).
  • Remove duplicated RevertedFinalizedEpoch check from after_finalization
  • Delete useless and unused max_finality_distance
  • Add tests that exercise pruning with shared states at skip slots
  • Delete unnecessary block_strategy argument from add_blocks and friends in the test harness (will likely conflict with [Merged by Bors] - Alternative (to BeaconChainHarness) BeaconChain testing API #1380 slightly, sorry @adaszko -- but we can fix that)
  • Bonus: add a BeaconChain::with_head method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?

Additional Info

Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded chain_db growth! 🎉

@michaelsproul michaelsproul added the ready-for-review The code is ready for review label Aug 25, 2020
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I couldn't fault it.

Bonus: add a BeaconChain::with_head method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?

I'm happy to keep it :) I had to add something similar for the HTTP API, so it's clearly necessary!

@paulhauner
Copy link
Member

I will run this on our test nodes over night and report back!

@paulhauner
Copy link
Member

I re-synced four nodes and I'm seeing nodes with ~/.lighthouse/beacon sizes:

  • 8.4gb
  • 8.5gb
  • 8.5gb
  • 8.5gb

Interestingly, the two nodes for which I had metrics showed a little over an hour of <1gb usage during syncing, then a fairly rapid growth to ~8gb over an hour, then stability.

The nodes are following head and aren't exhibiting any faults, from what I can see. I can't see any error logs, but the logging channel keeps overflowing so I can't be certain (I will fix this).

let iter = std::iter::once(Ok((head_hash, head_state_hash, head_slot)))
.chain(RootsIterator::from_block(Arc::clone(&store), head_hash)?);
.chain(RootsIterator::from_block(store.clone(), head_hash)?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this iterator doesn't iterate across state roots from skipped slots. I would think we need to do a state root iter across (Option<SignedBeaconBlockHash>, BeaconStateHash) to clean up those skip-slot states. Perhaps I'm missing something though :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah the RootsIterator uses the block roots and state roots from a beacon state, so it returns something for every slot (including the roots of skipped states).

fn do_next(&mut self) -> Result<Option<(Hash256, Hash256, Slot)>, Error> {
if self.slot == 0 || self.slot > self.beacon_state.slot {
return Ok(None);
}
self.slot -= 1;
match (
self.beacon_state.get_block_root(self.slot),
self.beacon_state.get_state_root(self.slot),
) {
(Ok(block_root), Ok(state_root)) => Ok(Some((*block_root, *state_root, self.slot))),
(Err(BeaconStateError::SlotOutOfBounds), Err(BeaconStateError::SlotOutOfBounds)) => {
// Read a `BeaconState` from the store that has access to prior historical roots.
let beacon_state =
next_historical_root_backtrack_state(&*self.store, &self.beacon_state)?;
self.beacon_state = Cow::Owned(beacon_state);
let block_root = *self.beacon_state.get_block_root(self.slot)?;
let state_root = *self.beacon_state.get_state_root(self.slot)?;
Ok(Some((block_root, state_root, self.slot)))
}
(Err(e), _) => Err(e.into()),
(Ok(_), Err(e)) => Err(e.into()),
}

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Aug 25, 2020
@michaelsproul
Copy link
Member Author

Yay!

bors r+

bors bot pushed a commit that referenced this pull request Aug 26, 2020
## Issue Addressed

Closes #1488

## Proposed Changes

* Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain.
* Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs.
* Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell).
* Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization`
* Delete useless and unused `max_finality_distance`
* Add tests that exercise pruning with shared states at skip slots
* Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that)
* Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?

## Additional Info

Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉
@bors bors bot changed the title Fix bug in database pruning [Merged by Bors] - Fix bug in database pruning Aug 26, 2020
@bors bors bot closed this Aug 26, 2020
@michaelsproul michaelsproul deleted the pruning-fix branch August 26, 2020 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DB inconsistent after restart
2 participants