Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accelerate getShufflingRef #4911

Merged
merged 12 commits into from
May 12, 2023
Merged

accelerate getShufflingRef #4911

merged 12 commits into from
May 12, 2023

Conversation

etan-status
Copy link
Contributor

When an uncached ShufflingRef is requested, we currently replay state which can take several seconds. Acceleration is possible by:

  1. Start from any state with locked-in get_active_validator_indices. Any blocks / slots applied to such a state can only affect that result for future epochs, so are viable for querying target epoch. compute_activation_exit_epoch(state.slot.epoch) > target.epoch

  2. Determine highest common ancestor among state and target.blck. At the ancestor slot, same rules re get_active_validator_indices. compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch

  3. We now have a state that shares history with target.blck up through a common ancestor slot. Any blocks / slots that the state contains, which are not part of the target.blck history, affect get_active_validator_indices at epochs after target.epoch.

  4. Select state.randao_mixes[N] that is closest to common ancestor. Either direction is fine (above / below ancestor).

  5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks in-between. This is just an XOR operation, so fully reversible. mix = mix xor SHA256(blck.message.body.randao_reveal)

  6. Compute the attester dependent slot from target.epoch. if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT

  7. Trace back from target.blck to the attester dependent slot. We now have the destination for which we want to obtain RANDAO.

  8. Mix in all RANDAO reveals from blocks up through the dependentBlck. Same method, no special handling necessary for epoch transitions.

  9. Combine get_active_validator_indices from state at target.epoch with the recovered RANDAO value at dependentBlck to obtain the requested shuffling, and construct the ShufflingRef without replay.

When an uncached `ShufflingRef` is requested, we currently replay state
which can take several seconds. Acceleration is possible by:

1. Start from any state with locked-in `get_active_validator_indices`.
   Any blocks / slots applied to such a state can only affect that
   result for future epochs, so are viable for querying target epoch.
   `compute_activation_exit_epoch(state.slot.epoch) > target.epoch`

2. Determine highest common ancestor among `state` and `target.blck`.
   At the ancestor slot, same rules re `get_active_validator_indices`.
   `compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch`

3. We now have a `state` that shares history with `target.blck` up
   through a common ancestor slot. Any blocks / slots that the `state`
   contains, which are not part of the `target.blck` history, affect
   `get_active_validator_indices` at epochs _after_ `target.epoch`.

4. Select `state.randao_mixes[N]` that is closest to common ancestor.
   Either direction is fine (above / below ancestor).

5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks
   in-between. This is just an XOR operation, so fully reversible.
   `mix = mix xor SHA256(blck.message.body.randao_reveal)`

6. Compute the attester dependent slot from `target.epoch`.
   `if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT`

7. Trace back from `target.blck` to the attester dependent slot.
   We now have the destination for which we want to obtain RANDAO.

8. Mix in all RANDAO reveals from blocks up through the `dependentBlck`.
   Same method, no special handling necessary for epoch transitions.

9. Combine `get_active_validator_indices` from `state` at `target.epoch`
   with the recovered RANDAO value at `dependentBlck` to obtain the
   requested shuffling, and construct the `ShufflingRef` without replay.
@etan-status
Copy link
Contributor Author

Measurement on 2019 MacBook Pro (Mainnet):

ulimit -n 1024 && make update && make -j nimbus_beacon_node && build/nimbus_beacon_node --data-dir="$HOME/Downloads/nimbus/data/mainnet" --rest --tcp-port=9010 --udp-port=9010 --history=prune --no-el

State replayed topics="chaindag" blocks=30 slots=64 current=1819b537:6397279@6397280 ancestor=c8043e44:6397215@6397216 target=73bd8e90:6397245@6397280 ancestorStateRoot=2c2f3a62 targetStateRoot=e9a50a6d found=false assignDur=126ms487us120ns replayDur=3s464ms169us744ns

  • Old logic: 3s464ms169us744ns (state replay)
  • New logic:
    • from dag.headState: 629us890ns
    • from dag.epochRefState: 570us984ns
    • from dag.clearanceState: 551us462ns

@etan-status
Copy link
Contributor Author

Another one:

INF 2023-05-09 09:44:02.112+02:00 State replayed topics="chaindag" blocks=28 slots=64 current=2a896cd9:6399487@6399488 ancestor=c88d045f:6399423@6399424 target=82287782:6399451@6399488 ancestorStateRoot=f3a91737 targetStateRoot=96923c73 found=false assignDur=126ms544us527ns replayDur=3s427ms537us618ns

  • Old logic: 3s427ms537us618ns (state replay)
  • New logic:
    • from dag.headState: 1ms819us443ns
    • from dag.epochRefState: 1ms440us682ns
    • from dag.clearanceState: 2ms143us15ns

@etan-status etan-status marked this pull request as ready for review May 9, 2023 13:11
@etan-status
Copy link
Contributor Author

Depends on #4910

@github-actions
Copy link

github-actions bot commented May 9, 2023

Unit Test Results

         9 files  ±0    1 077 suites  +3   42m 8s ⏱️ + 7m 37s
  3 677 tests +2    3 398 ✔️ +2  279 💤 ±0  0 ±0 
15 674 runs  +6  15 369 ✔️ +6  305 💤 ±0  0 ±0 

Results for commit 040f233. ± Comparison against base commit cc341e0.

♻️ This comment has been updated with latest results.

@etan-status
Copy link
Contributor Author

Depends on #4932

@etan-status
Copy link
Contributor Author

Tests passing locally, also should be green on GH after the two dependencies are in.


# Check that state is related to the information stored in the DAG,
# and determine the corresponding `BlockRef`, or `finalizedHead` if finalized
let
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a clarifying comment explaining why the finalized block in particular is a good starting point for re-calculating the RANDAO.

If a shuffling is requested for a very old state, wouldn't it still be better to find the nearest state snapshot in the database and start the optimised RANDAO computation from there? Is there an assumption that we are never computing the shuffling for such old epochs?

Copy link
Contributor Author

@etan-status etan-status May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuffling consists of two parts:

  1. get_active_validator_indices at target epoch
  2. RANDAO at dependent slot

--

  1. can be queried from any state that includes history up to ~5 epochs before target. If it's finalized, that means, any state can be used. you just go through validators and check which ones were active at that particular epoch.

  2. can be recomputed from any history that has a common ancestor. if it is a finalized part, just start at the historic randao_mixes for the requested epoch (or the one before / after, if closer) and apply blocks from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I just want a BlockRef, and they don't exist for the finalized portion of the chain (that part only has BlockId)

Copy link
Contributor Author

@etan-status etan-status May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are compatible, the work to recover RANDAO is ~constant, as there is a checkpoint stored in BeaconState each epoch.

You don't have to replay all the epochs to recover it, can start from closest checkpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that we are never computing the shuffling for such old epochs?

Yes, gossip validation ignores very old attestations before loading the shuffling.

@etan-status
Copy link
Contributor Author

@arnetheduck wanted to do a more thorough review on this as well, so please wait with merge until that's done.

@etan-status etan-status merged commit ea97e93 into unstable May 12, 2023
@etan-status etan-status deleted the dev/etan/bd-accelshuffling branch May 12, 2023 17:37
aa = aa.parent
doAssert aa != nil, "All `BlockRef` lead to `finalizedHead`"
if aa.slot < lowSlot:
return err()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opt.none() looks better with Opt

let
stateBid = state.latest_block_id
stateBlck =
if dag.finalizedHead.blck == nil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalizedHead is always not nil

elif stateBid.slot > dag.finalizedHead.blck.slot:
? dag.getBlockRef(stateBid.root)
elif stateBid.slot == dag.finalizedHead.blck.slot:
if stateBid.root != dag.finalizedHead.blck.root:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a bug

Copy link
Member

@arnetheduck arnetheduck May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, all of this can be replaced byt stateBlck = getBlockRef(stateBit.root) or dag.finalizedHead.blck (getBlockRef returns the finalized blck in applicable cases)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minus the unnecessary getBlockRef scan if we already know that it is finalized. Good to know that this doesn't need as much defense as have been put here. So, current logic should be fine, it can just be rewritten to be more concise, as I understand.

blck: BlockRef, epoch: Epoch
): Opt[tuple[dependentBid: BlockId, mix: Eth2Digest]] =
## Compute the requested RANDAO mix for `blck@epoch` based on `state`.
## `state` must have the correct `get_active_validator_indices` for `epoch`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like an unnecessary constraint for this function - ie the requirement here is that the state shares an ancestor within EPOCHS_PER_HISTORICAL_VECTOR slots (in a non-finalizing history, this will not always be true, even for a blckref).

ie randao recovery / mixing is orthogonal to shuffling and the logic would ideally reflect this (the shuffling is more strongly constrained)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could be slow to process though, if we relax the precondition, and start from a very old state, it would mean adding many thousands of blocks. Agree though, the result should still be correct, but the function is not intended to be used that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, performance is not really the question here, but rather clarity of the implementation: mixing orthogonal constraints creates a false dependency which makes the code confusing.

The randao mixing time is essentially linear in the number of blocks that need to be replayed no matter the depth, including the special case of shuffling.

In particular, this function could be used in the REST interface that returns randao: that API would be a lot more efficient if it didn't do a full state replay and it would benefit from the full range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other reason why this is important is to keep the constraints in tune with respect to the limits that the state itself provides: EPOCHS_PER_HISTORICAL_VECTOR is also tied to the distance at which get_block_root_at_slot return correct results etc, so in analyzing the correctness of this function, it's better that its constraints are directly expressed in the concepts that cause the constraint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion, this algorithm could be simplified in the following manner:

  • because xor is commutative, we can apply the commonAncestor logic directly by walking blocks via the parent root if the block data loaded together with randao instead of going via blockref/getblockatslot/etc - this reduces the components involved in computing the randao and thus increases its robustness
    • key insight here is that we can walk state and desired mixes in any order, rather than first "undoing" one then "doing" the other
    • this should reduce the number of moving parts in the code, reduce off-by-one in slot logic, handle empty slots more gracefully etc
  • minimal pre-validation still needs to be done so that we don't end up with overlong walks that end up failing in case of long periods of non-finality
    • the code could further be generalised by taking a BlockId as input, though this would complicate pre-checking slightly
  • it "should" be possible to load the randao/parent from the block without decoding all of the block - this is completely orthogonal to this PR but an interesting idea for the future

let bsi = ? dag.getBlockIdAtSlot(highSlot)
doAssert bsi.bid.root == highRoot
bsi.bid
while bid.slot >= slotsToMix.a:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test for when slotsToMix.a is an empty slot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the way how it is called, don't think that's possible, as slotsToMix is based on the common ancestor block. But sure, can add such a test as well.

etan-status added a commit that referenced this pull request May 15, 2023
etan-status added a commit that referenced this pull request May 15, 2023
etan-status added a commit that referenced this pull request May 15, 2023
etan-status added a commit that referenced this pull request May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants