Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform State Caching With Merkle Structure #2011

Closed
nisdas opened this issue Mar 17, 2019 · 4 comments · Fixed by #4602
Closed

Perform State Caching With Merkle Structure #2011

nisdas opened this issue Mar 17, 2019 · 4 comments · Fixed by #4602
Labels
Blocked Blocked by research or external factors
Milestone

Comments

@nisdas
Copy link
Member

nisdas commented Mar 17, 2019

The spec now includes a section on state caching, this issue is to track state caching from this PR in the specs repo ethereum/consensus-specs#732 and is also a part of #1999.

def cache_state(state: BeaconState) -> None:
    previous_slot_state_root = hash_tree_root(state)

    # store the previous slot's post state transition root
    state.latest_state_roots[state.slot % SLOTS_PER_HISTORICAL_ROOT] = previous_slot_state_root

    # cache state root in stored latest_block_header if empty
    if state.latest_block_header.state_root == ZERO_HASH:
        state.latest_block_header.state_root = previous_slot_state_root

    # store latest known block for previous slot
    state.latest_block_roots[state.slot % SLOTS_PER_HISTORICAL_ROOT] = hash_tree_root(state.latest_block_header)
@prestonvanloon prestonvanloon added the Blocked Blocked by research or external factors label Mar 18, 2019
@prestonvanloon
Copy link
Member

Blocked by testnet feature freeze, but design can start now.

@prestonvanloon prestonvanloon added this to the Diamond milestone Mar 27, 2019
@rauljordan
Copy link
Contributor

A design that would have excellent side effects for this would be store the state itself in a Merkle DB structure.

From @protolambda's design:

State:

# Versioning
genesis_time: uint64                                              -- once
slot: Slot                                                        -- every slot (overwrite)
fork: Fork                                                        -- every fork (overwrite)
# History                                                         
latest_block_header: BeaconBlockHeader                            
block_roots: Vector[Hash, SLOTS_PER_HISTORICAL_ROOT]              -- every slot (overwrite 1)
state_roots: Vector[Hash, SLOTS_PER_HISTORICAL_ROOT]              -- every slot (overwrite 1)
historical_roots: List[Hash, HISTORICAL_ROOTS_LIMIT]              -- every 2**13 slots (13 hours) (add 1)
# Eth1            
eth1_data: Eth1Data                                               -- any block, assume 1/1024 slots. (overwrite 1)
eth1_data_votes: List[Eth1Data, SLOTS_PER_ETH1_VOTING_PERIOD]     -- every block (add 1)
eth1_deposit_index: uint64                                        -- any block (overwrite)
# Registry            
validators: List[Validator, VALIDATOR_REGISTRY_LIMIT]             -- every epoch (overwrite any), less likely every block 
balances: List[Gwei, VALIDATOR_REGISTRY_LIMIT]                    -- every block (overwrite any)
# Shuffling            
start_shard: Shard                                                -- every epoch (overwrite)
randao_mixes: Vector[Hash, EPOCHS_PER_HISTORICAL_VECTOR]          -- every block (overwrite)
active_index_roots: Vector[Hash, EPOCHS_PER_HISTORICAL_VECTOR]        -- every epoch (overwrite 1)
compact_committees_roots: Vector[Hash, EPOCHS_PER_HISTORICAL_VECTOR]  -- every epoch (overwrite 1)
# Slashings
slashings: Vector[Gwei, EPOCHS_PER_SLASHINGS_VECTOR]              -- every epoch (overwrite 1)
# Attestations
previous_epoch_attestations: List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH]  -- every block (add a few), every epoch reset
current_epoch_attestations: List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH]   -- every block (add a few), every epoch reset
# Crosslinks
previous_crosslinks: Vector[Crosslink, SHARD_COUNT]               -- every epoch (overwrite all, potentially a subset)
current_crosslinks: Vector[Crosslink, SHARD_COUNT]                -- every epoch (overwrite all, potentially a subset)
# Finality
justification_bits: Bitvector[JUSTIFICATION_BITS_LENGTH]          -- every epoch
previous_justified_checkpoint: Checkpoint                         -- every epoch (overwrite potentially)
current_justified_checkpoint: Checkpoint                          -- every epoch (overwrite potentially)
finalized_checkpoint: Checkpoint                                  -- every epoch (overwrite potentially)

The data that is overwritten more often than every epoch can be restored by re-applying the blocks on top of the state of the epoch start. However, storing the full-epoch state may sitll include duplicate data, as a few big arrays only change a little every epoch. This data (block_roots, state_roots, historical_roots, active_index_roots, compact_committees_roots, slashings) could be separated out.

The more advanced alternative would be to create a merkle-DB for binary trees, and overlay a custom index for each type of content you need. If unified with hash-tree-root, this may be a good way to re-hash a state efficiently (i.e. use unchanged tree nodes).

Doing this would allow us to easily fetch the data we need at runtime without needing the entire state and would allow for lightning fast fetching of its hash tree root.

@rauljordan
Copy link
Contributor

rauljordan commented Oct 21, 2019

@nisdas @prestonvanloon @terencechain @0xKiwi @shayzluf what do you all think? What we have to carefully consider here is how much of this we will keep in memory vs. on disk and how we can prevent data corruption or accidental mutation.

@nisdas
Copy link
Member Author

nisdas commented Oct 21, 2019

@rauljordan this looks good to me. Optimizing for some thing like this would definetly be very helpful. But as you stated we need to be extra careful about data corruption. The state having so many pieces, it would be easy for a minor bug to slip in and cause an unintended mutation. If we store this as a binary db, how would the state look like in bolt ?

Right now we just place the whole ssz blob into the state key. But with something like this we would need to break it up further and have some sort of schema to iterate over keys for the state field. Ex: state.Slot => could be stored in a nested bucket using the state key. The same would apply for other fields.
Also you could potentially share fields between states.
Ex: balances are the same for state with root 0x123 and 0x456 . We would only need to store it once. And just update the reference for each state.

This all would need a detailed design.

@rauljordan rauljordan changed the title Perform State Caching Perform State Caching With Merkle Structure Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked Blocked by research or external factors
Projects
None yet
3 participants