-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Fix bug in database pruning #1564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I couldn't fault it.
Bonus: add a BeaconChain::with_head method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?
I'm happy to keep it :) I had to add something similar for the HTTP API, so it's clearly necessary!
I will run this on our test nodes over night and report back! |
I re-synced four nodes and I'm seeing nodes with
Interestingly, the two nodes for which I had metrics showed a little over an hour of <1gb usage during syncing, then a fairly rapid growth to ~8gb over an hour, then stability. The nodes are following head and aren't exhibiting any faults, from what I can see. I can't see any error logs, but the logging channel keeps overflowing so I can't be certain (I will fix this). |
let iter = std::iter::once(Ok((head_hash, head_state_hash, head_slot))) | ||
.chain(RootsIterator::from_block(Arc::clone(&store), head_hash)?); | ||
.chain(RootsIterator::from_block(store.clone(), head_hash)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this iterator doesn't iterate across state roots from skipped slots. I would think we need to do a state root iter across (Option<SignedBeaconBlockHash>, BeaconStateHash)
to clean up those skip-slot states. Perhaps I'm missing something though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah the RootsIterator
uses the block roots and state roots from a beacon state, so it returns something for every slot (including the roots of skipped states).
lighthouse/beacon_node/store/src/iter.rs
Lines 190 to 216 in 175471a
fn do_next(&mut self) -> Result<Option<(Hash256, Hash256, Slot)>, Error> { | |
if self.slot == 0 || self.slot > self.beacon_state.slot { | |
return Ok(None); | |
} | |
self.slot -= 1; | |
match ( | |
self.beacon_state.get_block_root(self.slot), | |
self.beacon_state.get_state_root(self.slot), | |
) { | |
(Ok(block_root), Ok(state_root)) => Ok(Some((*block_root, *state_root, self.slot))), | |
(Err(BeaconStateError::SlotOutOfBounds), Err(BeaconStateError::SlotOutOfBounds)) => { | |
// Read a `BeaconState` from the store that has access to prior historical roots. | |
let beacon_state = | |
next_historical_root_backtrack_state(&*self.store, &self.beacon_state)?; | |
self.beacon_state = Cow::Owned(beacon_state); | |
let block_root = *self.beacon_state.get_block_root(self.slot)?; | |
let state_root = *self.beacon_state.get_state_root(self.slot)?; | |
Ok(Some((block_root, state_root, self.slot))) | |
} | |
(Err(e), _) => Err(e.into()), | |
(Ok(_), Err(e)) => Err(e.into()), | |
} |
Yay! bors r+ |
## Issue Addressed Closes #1488 ## Proposed Changes * Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain. * Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs. * Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell). * Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization` * Delete useless and unused `max_finality_distance` * Add tests that exercise pruning with shared states at skip slots * Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that) * Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it? ## Additional Info Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉
Pull request successfully merged into master. Build succeeded: |
Issue Addressed
Closes #1488
Proposed Changes
debug
logging to the pruning algorithm so we have so better chance of debugging future issues from logs.RevertedFinalizedEpoch
check fromafter_finalization
max_finality_distance
block_strategy
argument fromadd_blocks
and friends in the test harness (will likely conflict with [Merged by Bors] - Alternative (to BeaconChainHarness) BeaconChain testing API #1380 slightly, sorry @adaszko -- but we can fix that)BeaconChain::with_head
method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?Additional Info
Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded
chain_db
growth! 🎉