-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify sync protocol and update to calculate optimistic heads #2746
Changes from 5 commits
25f2efa
013e814
e104164
7718872
c4f7097
06af629
6fa1970
7de1495
c30662b
402c663
916193b
2f618f7
25d88fe
257c241
de89238
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,12 +13,14 @@ | |
- [Preset](#preset) | ||
- [Misc](#misc) | ||
- [Containers](#containers) | ||
- [`LightClientSnapshot`](#lightclientsnapshot) | ||
- [`LightClientUpdate`](#lightclientupdate) | ||
- [`LightClientStore`](#lightclientstore) | ||
- [Helper functions](#helper-functions) | ||
- [`get_subtree_index`](#get_subtree_index) | ||
- [`get_active_header`](#get_active_header) | ||
- [`get_safety_threshold`](#get_safety_threshold) | ||
- [Light client state updates](#light-client-state-updates) | ||
- [`process_slot`](#process_slot) | ||
- [`validate_light_client_update`](#validate_light_client_update) | ||
- [`apply_light_client_update`](#apply_light_client_update) | ||
- [`process_light_client_update`](#process_light_client_update) | ||
|
@@ -47,34 +49,24 @@ uses sync committees introduced in [this beacon chain extension](./beacon-chain. | |
|
||
### Misc | ||
|
||
| Name | Value | | ||
| - | - | | ||
| `MIN_SYNC_COMMITTEE_PARTICIPANTS` | `1` | | ||
| Name | Value | Notes | | ||
| - | - | - | | ||
| `MIN_SYNC_COMMITTEE_PARTICIPANTS` | `1` | | | ||
| `SAFETY_THRESHOLD_CALCULATION_PERIOD` | `4096` | ~13.6 hours | | ||
|
||
## Containers | ||
|
||
### `LightClientSnapshot` | ||
|
||
```python | ||
class LightClientSnapshot(Container): | ||
# Beacon block header | ||
header: BeaconBlockHeader | ||
# Sync committees corresponding to the header | ||
current_sync_committee: SyncCommittee | ||
next_sync_committee: SyncCommittee | ||
``` | ||
|
||
### `LightClientUpdate` | ||
|
||
```python | ||
class LightClientUpdate(Container): | ||
# Update beacon block header | ||
header: BeaconBlockHeader | ||
# The beacon block header that is attested to by the sync committee | ||
attested_header: BeaconBlockHeader | ||
vbuterin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Next sync committee corresponding to the header | ||
next_sync_committee: SyncCommittee | ||
next_sync_committee_branch: Vector[Bytes32, floorlog2(NEXT_SYNC_COMMITTEE_INDEX)] | ||
# Finality proof for the update header | ||
finality_header: BeaconBlockHeader | ||
# The finalized beacon block header attested to by Merkle branch | ||
finalized_header: BeaconBlockHeader | ||
finality_branch: Vector[Bytes32, floorlog2(FINALIZED_ROOT_INDEX)] | ||
# Sync committee aggregate signature | ||
sync_committee_bits: Bitvector[SYNC_COMMITTEE_SIZE] | ||
hwwhww marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
@@ -86,10 +78,19 @@ class LightClientUpdate(Container): | |
### `LightClientStore` | ||
|
||
```python | ||
@dataclass | ||
class LightClientStore(object): | ||
snapshot: LightClientSnapshot | ||
valid_updates: Set[LightClientUpdate] | ||
# Beacon block header that is finalized | ||
finalized_header: BeaconBlockHeader | ||
# Sync committees corresponding to the header | ||
current_sync_committee: SyncCommittee | ||
next_sync_committee: SyncCommittee | ||
# Best available header to switch finalized head to if we see nothing else | ||
best_valid_update: Optional[LightClientUpdate] | ||
# Most recent available reasonably-safe header | ||
optimistic_header: BeaconBlockHeader | ||
# Max number of participants in a sync committee (used to calculate safety threshold) | ||
previous_period_max_attendance: uint64 | ||
current_period_max_attendance: uint64 | ||
vbuterin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
## Helper functions | ||
|
@@ -101,50 +102,80 @@ def get_subtree_index(generalized_index: GeneralizedIndex) -> uint64: | |
return uint64(generalized_index % 2**(floorlog2(generalized_index))) | ||
``` | ||
|
||
### `get_active_header` | ||
|
||
```python | ||
def get_active_header(update: LightClientUpdate) -> BeaconBlockHeader: | ||
# Is the update trying to convince us of a finalized header or an optimistic header? | ||
if update.finalized_header != BeaconBlockHeader(): | ||
return update.finalized_header | ||
else: | ||
return update.attested_header | ||
``` | ||
|
||
### `get_safety_threshold` | ||
|
||
```python | ||
def get_safety_threshold(store: LightClientStore): | ||
return max( | ||
store.previous_period_max_attendance, | ||
store.current_period_max_attendance | ||
) // 2 | ||
``` | ||
|
||
## Light client state updates | ||
|
||
A light client maintains its state in a `store` object of type `LightClientStore` and receives `update` objects of type `LightClientUpdate`. Every `update` triggers `process_light_client_update(store, update, current_slot)` where `current_slot` is the current slot based on some local clock. | ||
A light client maintains its state in a `store` object of type `LightClientStore` and receives `update` objects of type `LightClientUpdate`. Every `update` triggers `process_light_client_update(store, update, current_slot)` where `current_slot` is the current slot based on some local clock. `process_slot` is processed every time the current slot increments. | ||
|
||
### `process_slot` | ||
|
||
```python | ||
def process_slot(store: LightClientStore, current_slot: Slot): | ||
if current_slot % SAFETY_THRESHOLD_CALCULATION_PERIOD == 0: | ||
store.previous_period_max_attendance = store.current_period_max_attendance | ||
store.current_period_max_attendance = 0 | ||
``` | ||
|
||
#### `validate_light_client_update` | ||
|
||
```python | ||
def validate_light_client_update(snapshot: LightClientSnapshot, | ||
def validate_light_client_update(store: LightClientStore, | ||
update: LightClientUpdate, | ||
genesis_validators_root: Root) -> None: | ||
# Verify update slot is larger than snapshot slot | ||
assert update.header.slot > snapshot.header.slot | ||
|
||
# Verify update slot is larger than slot of current best finalized header | ||
active_header = get_active_header(update) | ||
assert active_header.slot > store.finalized_header.slot | ||
vbuterin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Verify update does not skip a sync committee period | ||
snapshot_period = compute_epoch_at_slot(snapshot.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
update_period = compute_epoch_at_slot(update.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
assert update_period in (snapshot_period, snapshot_period + 1) | ||
finalized_period = compute_epoch_at_slot(store.finalized_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does that mean the light-client should fail if there is a skip period? This seems to be a fairly normal path when a client stops running for a few das. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The light-client can request historic However, what is still suboptimal is the case where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. agree on the updating path will trace sync-comm linkage. Also agree that this is not the issue raised by this PR. With regard to the edge case...it could cause some weird behaviors temporarily. For example, Again, this behavior could be better handled if we assume that the light-client can make request for specific LightClientUpdates when it times out or fall out of sync with the current stream of updates. The sync logic would becomes a lot cleaner to be separated into two sync mode: skip-sync mode and normal sync mode. |
||
update_period = compute_epoch_at_slot(active_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
assert update_period in (finalized_period, finalized_period + 1) | ||
|
||
# Verify update header root is the finalized root of the finality header, if specified | ||
hwwhww marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if update.finality_header == BeaconBlockHeader(): | ||
signed_header = update.header | ||
if update.finalized_header == BeaconBlockHeader(): | ||
assert update.finality_branch == [Bytes32() for _ in range(floorlog2(FINALIZED_ROOT_INDEX))] | ||
else: | ||
signed_header = update.finality_header | ||
assert is_valid_merkle_branch( | ||
leaf=hash_tree_root(update.header), | ||
leaf=hash_tree_root(update.finalized_header), | ||
branch=update.finality_branch, | ||
depth=floorlog2(FINALIZED_ROOT_INDEX), | ||
index=get_subtree_index(FINALIZED_ROOT_INDEX), | ||
root=update.finality_header.state_root, | ||
root=update.attested_header.state_root, | ||
) | ||
|
||
# Verify update next sync committee if the update period incremented | ||
if update_period == snapshot_period: | ||
sync_committee = snapshot.current_sync_committee | ||
if update_period == finalized_period: | ||
sync_committee = store.current_sync_committee | ||
assert update.next_sync_committee_branch == [Bytes32() for _ in range(floorlog2(NEXT_SYNC_COMMITTEE_INDEX))] | ||
else: | ||
sync_committee = snapshot.next_sync_committee | ||
sync_committee = store.next_sync_committee | ||
assert is_valid_merkle_branch( | ||
leaf=hash_tree_root(update.next_sync_committee), | ||
branch=update.next_sync_committee_branch, | ||
depth=floorlog2(NEXT_SYNC_COMMITTEE_INDEX), | ||
index=get_subtree_index(NEXT_SYNC_COMMITTEE_INDEX), | ||
root=update.header.state_root, | ||
root=active_header.state_root, | ||
) | ||
|
||
# Verify sync committee has sufficient participants | ||
|
@@ -153,43 +184,60 @@ def validate_light_client_update(snapshot: LightClientSnapshot, | |
# Verify sync committee aggregate signature | ||
participant_pubkeys = [pubkey for (bit, pubkey) in zip(update.sync_committee_bits, sync_committee.pubkeys) if bit] | ||
domain = compute_domain(DOMAIN_SYNC_COMMITTEE, update.fork_version, genesis_validators_root) | ||
signing_root = compute_signing_root(signed_header, domain) | ||
signing_root = compute_signing_root(update.attested_header, domain) | ||
assert bls.FastAggregateVerify(participant_pubkeys, signing_root, update.sync_committee_signature) | ||
``` | ||
|
||
#### `apply_light_client_update` | ||
|
||
```python | ||
def apply_light_client_update(snapshot: LightClientSnapshot, update: LightClientUpdate) -> None: | ||
snapshot_period = compute_epoch_at_slot(snapshot.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
update_period = compute_epoch_at_slot(update.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
if update_period == snapshot_period + 1: | ||
snapshot.current_sync_committee = snapshot.next_sync_committee | ||
snapshot.next_sync_committee = update.next_sync_committee | ||
snapshot.header = update.header | ||
def apply_light_client_update(store: LightClientStore, update: LightClientUpdate) -> None: | ||
active_header = get_active_header(update) | ||
finalized_period = compute_epoch_at_slot(store.finalized_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
update_period = compute_epoch_at_slot(active_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
if update_period == finalized_period + 1: | ||
store.current_sync_committee = store.next_sync_committee | ||
store.next_sync_committee = update.next_sync_committee | ||
store.finalized_header = active_header | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that it could be the case the store.finalized_header is not actually a finalized header, when the apply_light_client_update is called through update timeout? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was the case in the old version as well, but it was called just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm...if this is the intended "finalization" for the light-client, that is not great. In the case of timeout, why not just go to the network and ask for a committee changing update? I know that in this spec, we have not specify how to get that information. In any implementation, the light client is going to have to be able to ask for historic updates corresponding to some "sync-committee". If that is available, the finalization of just taking the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If sync committee participation is low, and none of the blocks exceeds the 2/3 majority for a day, there still needs to be a way to proceed though. Not sure how realistic that is for mainnet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that is fine. If that indeed happens once in blue moon, the light client would stop working syncing. The manual fix for light client operator is to use a newly acquired, trusted starting point. The code owner could also update their client's hard coded starting point. In a way, these manual interventions should be considered desirable because we have unexpected level of participation. However, if that happens a lot, I think that is more of an incentive design issue. We should consider how to fix that at the protocol level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Light clients are intended to be able to follow the chain in as similar a way to regular clients as possible. And one of the ethereum staking protocol's core design goals has all along been to have some path to be able to continue making progress under >1/3 offline conditions. So the light client protocol should include some way to do that. (I'm assuming light clients are going to be used in a lot of contexts, including automated ones, where manual intervention is hard and should be left to resolving 51% attacks) What is a better alternative to taking There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest that the light-client's ability to continue could just rely on the "data source", which are invariably backed by full nodes. The exact meaning of data source is not well defined yet because the networking layer could be portal-network, a LES-like p2p network, or a server-client RPC pairing. When the light-client experiences a timeout or falls behind the current sync-comm, i.e. the incoming Updates are not good enough to advance its finalized_header, the client would revert to a skip-sync mode. In a skip-sync mode, the client asks the "data source" for an update that would advance its sync-committee. A light client does not advance until it somehow find a way to access "finality". Because finality is guaranteed to be found in some data sources, a light client is stuck because it couldn't access the correct data sources (i.e. correct updates). The guarantee of a light client would find a way to advance should depends on a light client having a way to find the right updates. Again, networking is not defined yet; once it is defined, we can evaluate at what conditions the light-client might not be able to find the appropriate updates. |
||
``` | ||
|
||
#### `process_light_client_update` | ||
|
||
```python | ||
def process_light_client_update(store: LightClientStore, update: LightClientUpdate, current_slot: Slot, | ||
def process_light_client_update(store: LightClientStore, | ||
update: LightClientUpdate, | ||
current_slot: Slot, | ||
genesis_validators_root: Root) -> None: | ||
validate_light_client_update(store.snapshot, update, genesis_validators_root) | ||
store.valid_updates.add(update) | ||
|
||
update_timeout = SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD | ||
|
||
validate_light_client_update(store, update, genesis_validators_root) | ||
|
||
# Update the best update in case we have to force-update to it if the timeout elapses | ||
if sum(update.sync_committee_bits) > sum(store.best_finalization_update.sync_committee_bits): | ||
store.best_finalization_update = update | ||
vbuterin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Track the maximum attendance in the committee signatures | ||
store.current_period_max_attendance = max( | ||
store.current_period_max_attendance, | ||
update.sync_committee_bits.count(1) | ||
hwwhww marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
|
||
# Update the optimistic header | ||
if ( | ||
sum(update.sync_committee_bits) > get_safety_threshold(store) and | ||
update.attested_header.slot > store.optimistic_header.slot | ||
): | ||
store.optimistic_header = update.attested_header | ||
|
||
# Update finalized header | ||
if ( | ||
sum(update.sync_committee_bits) * 3 >= len(update.sync_committee_bits) * 2 | ||
and update.finality_header != BeaconBlockHeader() | ||
and update.finalized_header != BeaconBlockHeader() | ||
): | ||
# Apply update if (1) 2/3 quorum is reached and (2) we have a finality proof. | ||
# Note that (2) means that the current light client design needs finality. | ||
# It may be changed to re-organizable light client design. See the on-going issue consensus-specs#2182. | ||
apply_light_client_update(store.snapshot, update) | ||
store.valid_updates = set() | ||
elif current_slot > store.snapshot.header.slot + update_timeout: | ||
# Normal update through 2/3 threshold | ||
apply_light_client_update(store, update) | ||
store.best_valid_update = None | ||
elif current_slot > store.finalized_header.slot + update_timeout: | ||
vbuterin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Forced best update when the update timeout has elapsed | ||
apply_light_client_update(store.snapshot, | ||
max(store.valid_updates, key=lambda update: sum(update.sync_committee_bits))) | ||
store.valid_updates = set() | ||
apply_light_client_update(store, store.best_valid_update) | ||
store.best_valid_update = None | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full sync committee period is
256 Epochs * 32 Slots / Epoch = 8192 Slots
. To reliably keep track of*_period_max_attendance
the client needs to receive multiple updates during each period. If a client fetches an update early in sync committee periodN
, and then fetches another update late in the next sync committee periodN + 1
, it may even end up in a situation where*_period_max_attendance
both are 0. How was4096
determined?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have a very principled way to choose the
SAFETY_THRESHOLD_CALCULATION_PERIOD
yet. As far as I can tell, it's a responsiveness/vulnerability tradeoff. ASAFETY_THRESHOLD_CALCULATION_PERIOD
of eg. 1 epoch would mean that if the chain suddenly loses >50% of participants, light clients will only experience a 2 epoch delay, but this means that an attacker need only eclipse a client for 2 epochs to convince them of anything. SettingSAFETY_THRESHOLD_CALCULATION_PERIOD = UPDATE_TIMEOUT
(~1 day) pushes safety to the maximum, but at the cost of minimum adaptability.Though one path we could take is to set
SAFETY_THRESHOLD_CALCULATION_PERIOD = UPDATE_TIMEOUT
and then just assert that any desired faster responsiveness should come from clients implementing custom logic in the safety factor function (eg.max // 2
normally butmax // 4
after two epochs of the optimistic head not updating). I'm open to any option here.