Simplify sync protocol and update to calculate optimistic heads #2746

vbuterin · 2021-11-26T21:11:25Z

Simplify valid_updates to best_valid_update so the LightClientStore only needs to store O(1) data
Track an optimistic head, by looking for the highest-slot header which passes a safety threshold

1. Simplify `valid_updates` to `best_valid_update` so the `LightClientStore` only needs to store O(1) data 2. Track an optimistic head, by looking for the highest-slot header which passes a safety threshold

dapplion

Looks great 👍

1. Replace `header` and `finality_header` with `attested_header` (always the header signed by the committee) and `finailzed_header` (always the header verified by the Merkle branch) 2. Remove `LightClientSnapshot`, fold its fields into `LightClientStore` for simplicity

specs/altair/sync-protocol.md

Co-authored-by: terence tsao <[email protected]>

specs/altair/sync-protocol.md

etan-status · 2021-11-29T11:51:43Z

specs/altair/sync-protocol.md

+| Name | Value | Notes |
+| - | - | - |
+| `MIN_SYNC_COMMITTEE_PARTICIPANTS` | `1` | |
+| `SAFETY_THRESHOLD_CALCULATION_PERIOD` | `4096` | ~13.6 hours |


A full sync committee period is 256 Epochs * 32 Slots / Epoch = 8192 Slots. To reliably keep track of *_period_max_attendance the client needs to receive multiple updates during each period. If a client fetches an update early in sync committee period N, and then fetches another update late in the next sync committee period N + 1, it may even end up in a situation where *_period_max_attendance both are 0. How was 4096 determined?

I don't really have a very principled way to choose the SAFETY_THRESHOLD_CALCULATION_PERIOD yet. As far as I can tell, it's a responsiveness/vulnerability tradeoff. A SAFETY_THRESHOLD_CALCULATION_PERIOD of eg. 1 epoch would mean that if the chain suddenly loses >50% of participants, light clients will only experience a 2 epoch delay, but this means that an attacker need only eclipse a client for 2 epochs to convince them of anything. Setting SAFETY_THRESHOLD_CALCULATION_PERIOD = UPDATE_TIMEOUT (~1 day) pushes safety to the maximum, but at the cost of minimum adaptability.

Though one path we could take is to set SAFETY_THRESHOLD_CALCULATION_PERIOD = UPDATE_TIMEOUT and then just assert that any desired faster responsiveness should come from clients implementing custom logic in the safety factor function (eg. max // 2 normally but max // 4 after two epochs of the optimistic head not updating). I'm open to any option here.

specs/altair/sync-protocol.md

etan-status · 2021-12-01T15:22:20Z

specs/altair/sync-protocol.md

+    if current_slot % SAFETY_THRESHOLD_PERIOD == 0:
+        store.previous_max_active_participants = store.current_max_active_participants
+        store.current_max_active_participants = 0


Should apply_light_client_update also be triggered, in case current_slot > store.finalized_header.slot + UPDATE_TIMEOUT gets fulfilled?

Also, the condition to update optimistic_header may also be fulfilled after changes to *_max_active_participants.

specs/altair/sync-protocol.md

etan-status · 2021-12-07T12:47:55Z

specs/altair/sync-protocol.md

+    if update_period == finalized_period + 1:
+        store.current_sync_committee = store.next_sync_committee
+        store.next_sync_committee = update.next_sync_committee
+    store.finalized_header = active_header


If the optimistic_header was older, I guess it should also be updated here (to finalized_header).

specs/altair/sync-protocol.md

jinfwhuang · 2021-12-08T03:51:51Z

specs/altair/sync-protocol.md

+    if update_period == finalized_period + 1:
+        store.current_sync_committee = store.next_sync_committee
+        store.next_sync_committee = update.next_sync_committee
+    store.finalized_header = active_header


Does this mean that it could be the case the store.finalized_header is not actually a finalized header, when the apply_light_client_update is called through update timeout?

This was the case in the old version as well, but it was called just header there. finalized_header here seems to have a different meaning than in other contexts, it's just finalized for the light client (it won't revert it anymore). Agree that the naming is suboptimal. Likewise, the optimistic_header also seems to have a different meaning from the one discussed as part of the merge effort.

hmm...if this is the intended "finalization" for the light-client, that is not great.

In the case of timeout, why not just go to the network and ask for a committee changing update? I know that in this spec, we have not specify how to get that information. In any implementation, the light client is going to have to be able to ask for historic updates corresponding to some "sync-committee". If that is available, the finalization of just taking the store.best_valid_update is not great. I doubt that real client implementation is going to take this route.

If sync committee participation is low, and none of the blocks exceeds the 2/3 majority for a day, there still needs to be a way to proceed though. Not sure how realistic that is for mainnet.

I think that is fine. If that indeed happens once in blue moon, the light client would stop working syncing. The manual fix for light client operator is to use a newly acquired, trusted starting point. The code owner could also update their client's hard coded starting point. In a way, these manual interventions should be considered desirable because we have unexpected level of participation.

However, if that happens a lot, I think that is more of an incentive design issue. We should consider how to fix that at the protocol level.

Light clients are intended to be able to follow the chain in as similar a way to regular clients as possible. And one of the ethereum staking protocol's core design goals has all along been to have some path to be able to continue making progress under >1/3 offline conditions. So the light client protocol should include some way to do that.

(I'm assuming light clients are going to be used in a lot of contexts, including automated ones, where manual intervention is hard and should be left to resolving 51% attacks)

What is a better alternative to taking store.best_valid_update? The regular ethereum protocol advances during the non-finalization case by using the LMD GHOST fork choice rule, which follows the chain that has the most validators supporting it. store.best_valid_update approximates that. Is there a better approximation?

I would suggest that the light-client's ability to continue could just rely on the "data source", which are invariably backed by full nodes. The exact meaning of data source is not well defined yet because the networking layer could be portal-network, a LES-like p2p network, or a server-client RPC pairing.

When the light-client experiences a timeout or falls behind the current sync-comm, i.e. the incoming Updates are not good enough to advance its finalized_header, the client would revert to a skip-sync mode. In a skip-sync mode, the client asks the "data source" for an update that would advance its sync-committee. A light client does not advance until it somehow find a way to access "finality". Because finality is guaranteed to be found in some data sources, a light client is stuck because it couldn't access the correct data sources (i.e. correct updates).

The guarantee of a light client would find a way to advance should depends on a light client having a way to find the right updates. Again, networking is not defined yet; once it is defined, we can evaluate at what conditions the light-client might not be able to find the appropriate updates.

jinfwhuang · 2021-12-08T03:57:22Z

specs/altair/sync-protocol.md

-    snapshot_period = compute_epoch_at_slot(snapshot.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
-    update_period = compute_epoch_at_slot(update.header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD
-    assert update_period in (snapshot_period, snapshot_period + 1)
+    finalized_period = compute_epoch_at_slot(store.finalized_header.slot) // EPOCHS_PER_SYNC_COMMITTEE_PERIOD


Does that mean the light-client should fail if there is a skip period? This seems to be a fairly normal path when a client stops running for a few das.

The light-client can request historic LightClientUpdate from the network. It needs at least one update per period to follow along, as it only knows current_sync_committee and next_sync_committee and can only verify LightClientUpdate from those periods.

However, what is still suboptimal is the case where finalized_update is in a different period than attested_update, but this is not a problem introduced by this PR. Another tricky case to tackle for the case where attested_update is in a different period than the committee which signed it, which probably even requires some heuristics to figure out (as this case depends on there being missed slots at the start of an epoch). For now, these edge cases are all ignored, and the updates are only accepted if all of finalized_update, attested_update, and the sync committee signing it come from the same sync committee period.

Yes. agree on the updating path will trace sync-comm linkage. Also agree that this is not the issue raised by this PR.

With regard to the edge case...it could cause some weird behaviors temporarily. For example, apply_light_client_update is called due to timeout. Then, when there are valid update.finalized_header arrives, they will get rejected.

Again, this behavior could be better handled if we assume that the light-client can make request for specific LightClientUpdates when it times out or fall out of sync with the current stream of updates. The sync logic would becomes a lot cleaner to be separated into two sync mode: skip-sync mode and normal sync mode.

jinfwhuang · 2021-12-08T04:02:25Z

I will make one big picture comment that is slightly out of scope for this PR. It is relevant because this PR attempts to fix the sync logic when there is a client timeout. It is even more relevant if a light client has to get updates that skips a period.

As it stands right now, the spec does not provide a mechanism to skip-sync for a light-client that has an outdated view of the sync-committees. This is somewhat addressed by the timeout mechanism, but not fully. Furthermore, without an explicit skip-sync mechanism, it is hard to address the cold start problem.

Here is a spec that writes out skip-sync data objects on the portal network: https://github.com/ethereum/portal-network-specs/blob/master/beacon-chain/skip-sync-network.md
Here is a fully functioning light-client POC. It deviates slightly from the existing sync spec or the sync spec being discussed here. Instead, it implements two syncing mode: one for skip-sync and one for normal sync. The skip-sync requires making request for update corresponding to a sync-comm; the normal sync mode more or less reflects what is being proposed here. See: Light-client (WORK IN PROGRESS) jinfwhuang/prysm#5

jinfwhuang · 2021-12-08T04:14:28Z

specs/altair/sync-protocol.md

+def get_safety_threshold(store: LightClientStore) -> uint64:
+    return max(
+        store.previous_max_active_participants,     
+        store.current_max_active_participants


Is there a reason why the threshold is half of the max(previous, current)? This is just a heuristic check, correct? Can we add a note stating as such?

Related comment:
#2746 (comment)

…tructure

dapplion

This PR accomplishes the scope described in it's body successfully 👍

Other topics brought up in this PR description should be tackled in new PRs:

p2p networking
improved data structures

hwwhww

Well done in simplification. 👍

Agreed with @dapplion. FYI I'd like to move the file paths with other PRs. Let's merge this PR now and then propose suggestions & add other designs with other PRs.

vbuterin added 2 commits November 26, 2021 15:11

Simplify sync protocol and update to calculate optimistic heads

25f2efa

1. Simplify `valid_updates` to `best_valid_update` so the `LightClientStore` only needs to store O(1) data 2. Track an optimistic head, by looking for the highest-slot header which passes a safety threshold

Update sync-protocol.md

013e814

dapplion approved these changes Nov 27, 2021

View reviewed changes

dapplion mentioned this pull request Nov 27, 2021

LightClient server no state cache + tracks head ChainSafe/lodestar#3461

Merged

vbuterin added 2 commits November 27, 2021 07:25

Fixed ToC and get_active_header positioninf

7718872

terencechain reviewed Nov 27, 2021

View reviewed changes

specs/altair/sync-protocol.md Outdated Show resolved Hide resolved

Update specs/altair/sync-protocol.md

c4f7097

Co-authored-by: terence tsao <[email protected]>

etan-status reviewed Nov 29, 2021

View reviewed changes

vbuterin added 2 commits November 29, 2021 07:04

Updated in response to comments

06af629

Clarified next sync committee comment

6fa1970

hwwhww reviewed Nov 30, 2021

View reviewed changes

specs/altair/sync-protocol.md Outdated Show resolved Hide resolved

hwwhww added the scope:light-clients label Nov 30, 2021

hwwhww mentioned this pull request Nov 30, 2021

Fix #2746 lint #2750

Merged

etan-status mentioned this pull request Nov 30, 2021

Implement light client syncing status-im/nimbus-eth2#2337

Closed

etan-status reviewed Nov 30, 2021

View reviewed changes

specs/altair/sync-protocol.md Outdated Show resolved Hide resolved

hwwhww and others added 3 commits November 30, 2021 06:38

Fix lint (#2750)

7de1495

Consistently use sum instead of count(1)

c30662b

Fix function name leftover

402c663

etan-status reviewed Dec 1, 2021

View reviewed changes

dapplion reviewed Dec 5, 2021

View reviewed changes

specs/altair/sync-protocol.md Outdated Show resolved Hide resolved

etan-status reviewed Dec 7, 2021

View reviewed changes

hwwhww mentioned this pull request Dec 7, 2021

Formulation of LightClientUpdate.finality_header and finality_branch #2762

Closed

jinfwhuang reviewed Dec 8, 2021

View reviewed changes

specs/altair/sync-protocol.md Outdated Show resolved Hide resolved

jinfwhuang reviewed Dec 8, 2021

View reviewed changes

jinfwhuang mentioned this pull request Dec 9, 2021

Light-client (WORK IN PROGRESS) jinfwhuang/prysm#5

Open

Updates in response to comments

916193b

hwwhww added 3 commits December 14, 2021 21:38

Fix lint and presets

2f618f7

Fix process_light_client_update

25d88fe

Update test_sync_protocol.py per the new optimistic_header and data s…

257c241

…tructure

dapplion approved these changes Dec 15, 2021

View reviewed changes

Minor style fixes

de89238

hwwhww approved these changes Dec 15, 2021

View reviewed changes

hwwhww merged commit 2fa396f into dev Dec 15, 2021

hwwhww deleted the vbuterin-patch-12 branch December 15, 2021 16:12

tersec mentioned this pull request Jan 3, 2022

use v1.1.7 test vectors status-im/nimbus-eth2#3231

Merged

g11tech mentioned this pull request Jan 13, 2022

Light client structure update for spec 1.1.7 ChainSafe/lodestar#3616

Merged

This was referenced Jan 17, 2022

Fix next_sync_committe validation for light client #2806

Closed

Define libp2p protocol for light client sync #2802

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify sync protocol and update to calculate optimistic heads #2746

Simplify sync protocol and update to calculate optimistic heads #2746

vbuterin commented Nov 26, 2021

dapplion left a comment

etan-status Nov 29, 2021

vbuterin Nov 29, 2021

etan-status Dec 1, 2021

etan-status Dec 1, 2021

etan-status Dec 7, 2021

jinfwhuang Dec 8, 2021

etan-status Dec 8, 2021

jinfwhuang Dec 9, 2021

etan-status Dec 9, 2021

jinfwhuang Dec 9, 2021

vbuterin Dec 10, 2021

jinfwhuang Dec 13, 2021 •

edited

Loading

jinfwhuang Dec 8, 2021

etan-status Dec 8, 2021

jinfwhuang Dec 9, 2021

jinfwhuang commented Dec 8, 2021 •

edited

Loading

jinfwhuang Dec 8, 2021

etan-status Dec 8, 2021

jinfwhuang Dec 10, 2021

dapplion left a comment

hwwhww left a comment

Simplify sync protocol and update to calculate optimistic heads #2746

Simplify sync protocol and update to calculate optimistic heads #2746

Conversation

vbuterin commented Nov 26, 2021

dapplion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinfwhuang Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinfwhuang commented Dec 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dapplion left a comment

Choose a reason for hiding this comment

hwwhww left a comment

Choose a reason for hiding this comment

jinfwhuang Dec 13, 2021 •

edited

Loading

jinfwhuang commented Dec 8, 2021 •

edited

Loading