accelerate `getShufflingRef` #4911

etan-status · 2023-05-08T22:04:59Z

When an uncached ShufflingRef is requested, we currently replay state which can take several seconds. Acceleration is possible by:

Start from any state with locked-in get_active_validator_indices. Any blocks / slots applied to such a state can only affect that result for future epochs, so are viable for querying target epoch. compute_activation_exit_epoch(state.slot.epoch) > target.epoch
Determine highest common ancestor among state and target.blck. At the ancestor slot, same rules re get_active_validator_indices. compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch
We now have a state that shares history with target.blck up through a common ancestor slot. Any blocks / slots that the state contains, which are not part of the target.blck history, affect get_active_validator_indices at epochs after target.epoch.
Select state.randao_mixes[N] that is closest to common ancestor. Either direction is fine (above / below ancestor).
From that RANDAO mix, mix in / out all RANDAO reveals from blocks in-between. This is just an XOR operation, so fully reversible. mix = mix xor SHA256(blck.message.body.randao_reveal)
Compute the attester dependent slot from target.epoch. if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT
Trace back from target.blck to the attester dependent slot. We now have the destination for which we want to obtain RANDAO.
Mix in all RANDAO reveals from blocks up through the dependentBlck. Same method, no special handling necessary for epoch transitions.
Combine get_active_validator_indices from state at target.epoch with the recovered RANDAO value at dependentBlck to obtain the requested shuffling, and construct the ShufflingRef without replay.

When an uncached `ShufflingRef` is requested, we currently replay state which can take several seconds. Acceleration is possible by: 1. Start from any state with locked-in `get_active_validator_indices`. Any blocks / slots applied to such a state can only affect that result for future epochs, so are viable for querying target epoch. `compute_activation_exit_epoch(state.slot.epoch) > target.epoch` 2. Determine highest common ancestor among `state` and `target.blck`. At the ancestor slot, same rules re `get_active_validator_indices`. `compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch` 3. We now have a `state` that shares history with `target.blck` up through a common ancestor slot. Any blocks / slots that the `state` contains, which are not part of the `target.blck` history, affect `get_active_validator_indices` at epochs _after_ `target.epoch`. 4. Select `state.randao_mixes[N]` that is closest to common ancestor. Either direction is fine (above / below ancestor). 5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks in-between. This is just an XOR operation, so fully reversible. `mix = mix xor SHA256(blck.message.body.randao_reveal)` 6. Compute the attester dependent slot from `target.epoch`. `if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT` 7. Trace back from `target.blck` to the attester dependent slot. We now have the destination for which we want to obtain RANDAO. 8. Mix in all RANDAO reveals from blocks up through the `dependentBlck`. Same method, no special handling necessary for epoch transitions. 9. Combine `get_active_validator_indices` from `state` at `target.epoch` with the recovered RANDAO value at `dependentBlck` to obtain the requested shuffling, and construct the `ShufflingRef` without replay.

etan-status · 2023-05-09T06:56:52Z

Measurement on 2019 MacBook Pro (Mainnet):

ulimit -n 1024 && make update && make -j nimbus_beacon_node && build/nimbus_beacon_node --data-dir="$HOME/Downloads/nimbus/data/mainnet" --rest --tcp-port=9010 --udp-port=9010 --history=prune --no-el

State replayed topics="chaindag" blocks=30 slots=64 current=1819b537:6397279@6397280 ancestor=c8043e44:6397215@6397216 target=73bd8e90:6397245@6397280 ancestorStateRoot=2c2f3a62 targetStateRoot=e9a50a6d found=false assignDur=126ms487us120ns replayDur=3s464ms169us744ns

Old logic: 3s464ms169us744ns (state replay)
New logic:
- from dag.headState: 629us890ns
- from dag.epochRefState: 570us984ns
- from dag.clearanceState: 551us462ns

etan-status · 2023-05-09T08:42:03Z

Another one:

INF 2023-05-09 09:44:02.112+02:00 State replayed topics="chaindag" blocks=28 slots=64 current=2a896cd9:6399487@6399488 ancestor=c88d045f:6399423@6399424 target=82287782:6399451@6399488 ancestorStateRoot=f3a91737 targetStateRoot=96923c73 found=false assignDur=126ms544us527ns replayDur=3s427ms537us618ns

Old logic: 3s427ms537us618ns (state replay)
New logic:
- from dag.headState: 1ms819us443ns
- from dag.epochRefState: 1ms440us682ns
- from dag.clearanceState: 2ms143us15ns

etan-status · 2023-05-09T13:17:17Z

Depends on #4910

github-actions · 2023-05-09T16:39:11Z

Unit Test Results

        9 files ±0   1 077 suites +3 42m 8s ⏱️ + 7m 37s
  3 677 tests +2   3 398 ✔️ +2 279 💤 ±0 0 ❌ ±0
15 674 runs +6 15 369 ✔️ +6 305 💤 ±0 0 ❌ ±0

Results for commit 040f233. ± Comparison against base commit cc341e0.

♻️ This comment has been updated with latest results.

etan-status · 2023-05-10T14:58:53Z

Depends on #4932

etan-status · 2023-05-10T15:33:12Z

Tests passing locally, also should be green on GH after the two dependencies are in.

beacon_chain/consensus_object_pools/block_dag.nim

beacon_chain/consensus_object_pools/blockchain_dag.nim

Co-authored-by: Jacek Sieka <[email protected]>

zah · 2023-05-11T10:09:38Z

beacon_chain/consensus_object_pools/blockchain_dag.nim

+
+  # Check that state is related to the information stored in the DAG,
+  # and determine the corresponding `BlockRef`, or `finalizedHead` if finalized
+  let


Can you add a clarifying comment explaining why the finalized block in particular is a good starting point for re-calculating the RANDAO.

If a shuffling is requested for a very old state, wouldn't it still be better to find the nearest state snapshot in the database and start the optimised RANDAO computation from there? Is there an assumption that we are never computing the shuffling for such old epochs?

Shuffling consists of two parts:

get_active_validator_indices at target epoch

RANDAO at dependent slot

--

can be queried from any state that includes history up to ~5 epochs before target. If it's finalized, that means, any state can be used. you just go through validators and check which ones were active at that particular epoch.

can be recomputed from any history that has a common ancestor. if it is a finalized part, just start at the historic randao_mixes for the requested epoch (or the one before / after, if closer) and apply blocks from there.

Here, I just want a BlockRef, and they don't exist for the finalized portion of the chain (that part only has BlockId)

If you are compatible, the work to recover RANDAO is ~constant, as there is a checkpoint stored in BeaconState each epoch.

You don't have to replay all the epochs to recover it, can start from closest checkpoint.

Is there an assumption that we are never computing the shuffling for such old epochs?

Yes, gossip validation ignores very old attestations before loading the shuffling.

etan-status · 2023-05-11T10:48:09Z

@arnetheduck wanted to do a more thorough review on this as well, so please wait with merge until that's done.

arnetheduck · 2023-05-15T06:51:25Z

beacon_chain/consensus_object_pools/block_dag.nim

+      aa = aa.parent
+      doAssert aa != nil, "All `BlockRef` lead to `finalizedHead`"
+      if aa.slot < lowSlot:
+        return err()


Opt.none() looks better with Opt

arnetheduck · 2023-05-15T08:31:02Z

beacon_chain/consensus_object_pools/blockchain_dag.nim

+  let
+    stateBid = state.latest_block_id
+    stateBlck =
+      if dag.finalizedHead.blck == nil:


finalizedHead is always not nil

arnetheduck · 2023-05-15T08:36:36Z

beacon_chain/consensus_object_pools/blockchain_dag.nim

+      elif stateBid.slot > dag.finalizedHead.blck.slot:
+        ? dag.getBlockRef(stateBid.root)
+      elif stateBid.slot == dag.finalizedHead.blck.slot:
+        if stateBid.root != dag.finalizedHead.blck.root:


this would be a bug

in fact, all of this can be replaced byt stateBlck = getBlockRef(stateBit.root) or dag.finalizedHead.blck (getBlockRef returns the finalized blck in applicable cases)

minus the unnecessary getBlockRef scan if we already know that it is finalized. Good to know that this doesn't need as much defense as have been put here. So, current logic should be fine, it can just be rewritten to be more concise, as I understand.

arnetheduck · 2023-05-15T09:01:21Z

beacon_chain/consensus_object_pools/blockchain_dag.nim

+    blck: BlockRef, epoch: Epoch
+): Opt[tuple[dependentBid: BlockId, mix: Eth2Digest]] =
+  ## Compute the requested RANDAO mix for `blck@epoch` based on `state`.
+  ## `state` must have the correct `get_active_validator_indices` for `epoch`.


this seems like an unnecessary constraint for this function - ie the requirement here is that the state shares an ancestor within EPOCHS_PER_HISTORICAL_VECTOR slots (in a non-finalizing history, this will not always be true, even for a blckref).

ie randao recovery / mixing is orthogonal to shuffling and the logic would ideally reflect this (the shuffling is more strongly constrained)

that could be slow to process though, if we relax the precondition, and start from a very old state, it would mean adding many thousands of blocks. Agree though, the result should still be correct, but the function is not intended to be used that way.

so, performance is not really the question here, but rather clarity of the implementation: mixing orthogonal constraints creates a false dependency which makes the code confusing.

The randao mixing time is essentially linear in the number of blocks that need to be replayed no matter the depth, including the special case of shuffling.

In particular, this function could be used in the REST interface that returns randao: that API would be a lot more efficient if it didn't do a full state replay and it would benefit from the full range.

The other reason why this is important is to keep the constraints in tune with respect to the limits that the state itself provides: EPOCHS_PER_HISTORICAL_VECTOR is also tied to the distance at which get_block_root_at_slot return correct results etc, so in analyzing the correctness of this function, it's better that its constraints are directly expressed in the concepts that cause the constraint.

per discussion, this algorithm could be simplified in the following manner:

because xor is commutative, we can apply the commonAncestor logic directly by walking blocks via the parent root if the block data loaded together with randao instead of going via blockref/getblockatslot/etc - this reduces the components involved in computing the randao and thus increases its robustness

key insight here is that we can walk state and desired mixes in any order, rather than first "undoing" one then "doing" the other

this should reduce the number of moving parts in the code, reduce off-by-one in slot logic, handle empty slots more gracefully etc

minimal pre-validation still needs to be done so that we don't end up with overlong walks that end up failing in case of long periods of non-finality

the code could further be generalised by taking a BlockId as input, though this would complicate pre-checking slightly

it "should" be possible to load the randao/parent from the block without decoding all of the block - this is completely orthogonal to this PR but an interesting idea for the future

arnetheduck · 2023-05-15T10:00:01Z

beacon_chain/consensus_object_pools/blockchain_dag.nim

+        let bsi = ? dag.getBlockIdAtSlot(highSlot)
+        doAssert bsi.bid.root == highRoot
+        bsi.bid
+  while bid.slot >= slotsToMix.a:


do we have a test for when slotsToMix.a is an empty slot?

In the way how it is called, don't think that's possible, as slotsToMix is based on the common ancestor block. But sure, can add such a test as well.

This reverts commit ea97e93.

This reverts commit e43946b.

This reverts commit ea97e93.

This reverts commit 748be8b.

etan-status added 2 commits May 9, 2023 13:19

more tests and simplify logic

f862140

Merge branch 'unstable' into dev/etan/bd-accelshuffling

59c7872

etan-status marked this pull request as ready for review May 9, 2023 13:11

etan-status added 2 commits May 10, 2023 17:19

test with different number of deposits per branch

5623e81

Merge branch 'unstable' into dev/etan/bd-accelshuffling

97d3139

arnetheduck reviewed May 11, 2023

View reviewed changes

beacon_chain/consensus_object_pools/block_dag.nim Show resolved Hide resolved

arnetheduck reviewed May 11, 2023

View reviewed changes

beacon_chain/consensus_object_pools/blockchain_dag.nim Outdated Show resolved Hide resolved

etan-status and others added 5 commits May 11, 2023 10:49

Merge branch 'unstable' into dev/etan/bd-accelshuffling

e5ce13d

Update beacon_chain/consensus_object_pools/blockchain_dag.nim

32dc1d5

Co-authored-by: Jacek Sieka <[email protected]>

commonAncestor tests

29ad048

lint

485a071

Merge branch 'unstable' into dev/etan/bd-accelshuffling

4861ec4

zah reviewed May 11, 2023

View reviewed changes

zah approved these changes May 11, 2023

View reviewed changes

etan-status added 2 commits May 12, 2023 01:01

Merge branch 'unstable' into dev/etan/bd-accelshuffling

040f233

Merge branch 'unstable' into dev/etan/bd-accelshuffling

569f9a2

etan-status merged commit ea97e93 into unstable May 12, 2023

etan-status deleted the dev/etan/bd-accelshuffling branch May 12, 2023 17:37

etan-status mentioned this pull request May 13, 2023

ignore attestations that miss shuffling cache #4947

Closed

arnetheduck reviewed May 15, 2023

View reviewed changes

etan-status added a commit that referenced this pull request May 15, 2023

Revert "accelerate getShufflingRef (#4911)"

e43946b

This reverts commit ea97e93.

etan-status mentioned this pull request May 15, 2023

Revert "accelerate getShufflingRef (#4911)" #4958

Merged

etan-status added a commit that referenced this pull request May 15, 2023

Revert "Revert "accelerate getShufflingRef (#4911)""

b967e7b

This reverts commit e43946b.

etan-status added a commit that referenced this pull request May 15, 2023

Revert "accelerate getShufflingRef (#4911)" (#4958)

748be8b

This reverts commit ea97e93.

etan-status added a commit that referenced this pull request May 15, 2023

Revert "Revert "accelerate getShufflingRef (#4911)" (#4958)"

dbba003

This reverts commit 748be8b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accelerate `getShufflingRef` #4911

accelerate `getShufflingRef` #4911

etan-status commented May 8, 2023

etan-status commented May 9, 2023

etan-status commented May 9, 2023

etan-status commented May 9, 2023

github-actions bot commented May 9, 2023 •

edited

Loading

etan-status commented May 10, 2023

etan-status commented May 10, 2023

zah May 11, 2023

etan-status May 11, 2023 •

edited

Loading

etan-status May 11, 2023

etan-status May 11, 2023 •

edited

Loading

etan-status May 11, 2023

etan-status commented May 11, 2023

arnetheduck May 15, 2023

arnetheduck May 15, 2023

arnetheduck May 15, 2023

arnetheduck May 15, 2023 •

edited

Loading

etan-status May 15, 2023

arnetheduck May 15, 2023

etan-status May 15, 2023

arnetheduck May 15, 2023

arnetheduck May 15, 2023

arnetheduck May 15, 2023

arnetheduck May 15, 2023

etan-status May 15, 2023

accelerate getShufflingRef #4911

accelerate getShufflingRef #4911

Conversation

etan-status commented May 8, 2023

etan-status commented May 9, 2023

etan-status commented May 9, 2023

etan-status commented May 9, 2023

github-actions bot commented May 9, 2023 • edited Loading

Unit Test Results

etan-status commented May 10, 2023

etan-status commented May 10, 2023

Choose a reason for hiding this comment

etan-status May 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etan-status May 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etan-status commented May 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck May 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

accelerate `getShufflingRef` #4911

accelerate `getShufflingRef` #4911

github-actions bot commented May 9, 2023 •

edited

Loading

etan-status May 11, 2023 •

edited

Loading

etan-status May 11, 2023 •

edited

Loading

arnetheduck May 15, 2023 •

edited

Loading