Remove alignment limitations from checkpoint sync #3210

michaelsproul · 2022-05-24T01:07:45Z

Description

Currently Lighthouse's checkpoint sync requires that the checkpoint state is not from a skipped slot, see: https://lighthouse-book.sigmaprime.io/checkpoint-sync.html#alignment-requirements

This limitation is somewhat artificial, and was taken in order to simplify the initial implementation. I've forgotten exactly why but I'll post more info on this issue when I dive back into it again.

michaelsproul · 2022-07-25T07:37:32Z

Another desirable property: checkpoint sync from a state alone, without having to load the block.

michaelsproul · 2022-09-13T00:43:15Z

I think the simplest way to solve this would be to advance the provided state to the nearest epoch boundary, although I might wait for the discussion here to be resolved until implementing that change.

michaelsproul · 2023-06-13T00:41:06Z

Regarding the use of unaligned states, I went looking for some places where we look up states for block roots, which might fail if the only state stored is the advanced version of the state (e.g. block was at slot 30 and we've stored its state skipped through to 32).

The main place where this is relevant is in the on-finalization database migration here:

lighthouse/beacon_node/beacon_chain/src/canonical_head.rs

Lines 1019 to 1041 in c547a11

    
           // The store migration task requires the *state at the slot of the finalized epoch*, 
        
           // rather than the state of the latest finalized block. These two values will only 
        
           // differ when the first slot of the finalized epoch is a skip slot. 
        
           // 
        
           // Use the `StateRootsIterator` directly rather than `BeaconChain::state_root_at_slot` 
        
           // to ensure we use the same state that we just set as the head. 
        
           let new_finalized_slot = new_view 
        
               .finalized_checkpoint 
        
               .epoch 
        
               .start_slot(T::EthSpec::slots_per_epoch()); 
        
           let new_finalized_state_root = process_results( 
        
               StateRootsIterator::new(&self.store, &new_snapshot.beacon_state), 
        
               |mut iter| { 
        
                   iter.find_map(|(state_root, slot)| { 
        
                       if slot == new_finalized_slot { 
        
                           Some(state_root) 
        
                       } else { 
        
                           None 
        
                       } 
        
                   }) 
        
               }, 
        
           )? 
        
           .ok_or(Error::MissingFinalizedStateRoot(new_finalized_slot))?;

That code takes care to use the advanced form of the finalized block's state, which is exactly what we want.

This usage in the committee cache lookup could be problematic if the fork choice node stores the unadvanced state root:

lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs

Lines 5551 to 5559 in c547a11

    
           let state_root = head_block.state_root; 
        
           let state = self 
        
               .store 
        
               .get_inconsistent_state_for_attestation_verification_only( 
        
                   &state_root, 
        
                   Some(head_block.slot), 
        
               )? 
        
               .ok_or(Error::MissingBeaconState(head_block.state_root))?; 
        
           (state, state_root)

In fork choice itself where we load the balances from the justified state could also be an issue. When we checkpoint sync we artificially set the justified and finalized blocks equal to each other, so this could result in a lookup of the finalized block's state:

lighthouse/beacon_node/beacon_chain/src/beacon_fork_choice_store.rs

Lines 324 to 328 in c547a11

    
           let state = self 
        
               .store 
        
               .get_state(&justified_block.state_root(), Some(justified_block.slot())) 
        
               .map_err(Error::FailedToReadState)? 
        
               .ok_or_else(|| Error::MissingState(justified_block.state_root()))?;

We load the state for the parent block's state root in block verification:

lighthouse/beacon_node/beacon_chain/src/block_verification.rs

Lines 1771 to 1778 in c547a11

    
           // Load the parent blocks state from the database, returning an error if it is not found. 
        
           // It is an error because if we know the parent block we should also know the parent state. 
        
           let parent_state_root = parent_block.state_root(); 
        
           let parent_state = chain 
        
               .get_state(&parent_state_root, Some(parent_block.slot()))? 
        
               .ok_or_else(|| { 
        
                   BeaconChainError::DBInconsistent(format!("Missing state {:?}", parent_state_root)) 
        
               })?;

This would cause blocks that are descended from the finalized block (with skips) to error incorrectly. We should definitely fix this.

We load the state for the head block's state root on start-up:

lighthouse/beacon_node/beacon_chain/src/canonical_head.rs

Lines 298 to 304 in c547a11

    
           let beacon_block = store 
        
               .get_full_block(&beacon_block_root)? 
        
               .ok_or(Error::MissingBeaconBlock(beacon_block_root))?; 
        
           let beacon_state_root = beacon_block.state_root(); 
        
           let beacon_state = store 
        
               .get_state(&beacon_state_root, Some(beacon_block.slot()))? 
        
               .ok_or(Error::MissingBeaconState(beacon_state_root))?;

This could error if the finalized block is the head (e.g. restarting immediately after checkpoint sync).

There are probably other places too.

Some ideas for fixes:

A. Handle each case independently using bespoke logic. E.g. for fork choice we could make sure the state_root stored in the nodes is the advanced state's state root (when initialising from a checkpoint). This might work, but might also break other things.
B. Store the unaligned state in the database, and keep track of this. @paulhauner suggested a separate column, but I think the BeaconState column would be fine (using store_full_state). We'd just need to keep track of the state root for this unaligned finalized state in the store, and have a special case for loading the unaligned state when get_state requests a state root that matches.
C. Alternatively we could do a variant of B where we don't actually store the unaligned state, but we do store its state root and redirect requests for this state root to the aligned/advanced state. I think this is strictly worse, aside from the disk usage, as it breaks a pretty fundamental property of get_state.

So far I'm in favour of B, with no new column, and a DB schema migration to add the unaligned_finalized_state_root to the DB. I think it could reasonably live in the AnchorInfo, as it is only relevant when the anchor is non-null (i.e. for non-archive nodes).

michaelsproul · 2023-07-07T23:18:06Z

Adding the tree-states tag, because sorting out the alignment issue is required for #4481, which is required for a tree-states database migration.

…uning (#4610) ## Issue Addressed Closes #3210 Closes #3211 ## Proposed Changes - Checkpoint sync from the latest finalized state regardless of its alignment. - Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`. - Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal. - Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state. - Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis. ## Additional Info Needs further testing. I want to stress-test the pruned database under Hydra. The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache. Co-authored-by: realbigsean <[email protected]>

…uning (sigp#4610) Closes sigp#3210 Closes sigp#3211 - Checkpoint sync from the latest finalized state regardless of its alignment. - Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`. - Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal. - Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state. - Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis. Needs further testing. I want to stress-test the pruned database under Hydra. The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache. Co-authored-by: realbigsean <[email protected]>

michaelsproul added the enhancement New feature or request label May 24, 2022

This was referenced May 24, 2022

Add a mode to run without freezer states #3211

Closed

Beacon data directory grows indefinitely #3207

Closed

michaelsproul mentioned this issue Sep 9, 2022

Checkpoint Sync API ethereum/beacon-APIs#226

Closed

michaelsproul self-assigned this Sep 13, 2022

realbigsean mentioned this issue Jun 9, 2023

Checkpoint sync without alignment #4389

Closed

michaelsproul mentioned this issue Jul 7, 2023

Add logic to prune all historic states #4481

Closed

michaelsproul added database tree-states Upcoming state and database overhaul labels Jul 7, 2023

michaelsproul mentioned this issue Aug 11, 2023

[Merged by Bors] - Remove checkpoint alignment requirements and enable historic state pruning #4610

Closed

paulhauner closed this as completed in 20067b9 Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove alignment limitations from checkpoint sync #3210

Remove alignment limitations from checkpoint sync #3210

michaelsproul commented May 24, 2022

michaelsproul commented Jul 25, 2022

michaelsproul commented Sep 13, 2022

michaelsproul commented Jun 13, 2023

michaelsproul commented Jul 7, 2023

Remove alignment limitations from checkpoint sync #3210

Remove alignment limitations from checkpoint sync #3210

Comments

michaelsproul commented May 24, 2022

Description

michaelsproul commented Jul 25, 2022

michaelsproul commented Sep 13, 2022

michaelsproul commented Jun 13, 2023

michaelsproul commented Jul 7, 2023