Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added standalone light client patch #2130

Merged
merged 34 commits into from
Dec 17, 2020
Merged

Added standalone light client patch #2130

merged 34 commits into from
Dec 17, 2020

Conversation

vbuterin
Copy link
Contributor

Adding the standalone beacon chain changes to add light client support as a separate spec patch, independent of phase 1, to enable it to be tested and potentially rolled out sooner than the rest of phase 1.

@JustinDrake
Copy link
Collaborator

Two main substantive suggestions:

  1. Increase LIGHT_CLIENT_COMMITTEE_SIZE to 256 to address the RANDAOtage attack vector.
  2. Add a pubkeys_sum field to CompactCommittee which sums all the pubkeys. This unlocks an optimisation whereby light clients only have to download pubkeys of validators that do not vote. For a 2/3 threshold this is at least a 2x optimisation. In the expected/optimistic scenario where 95% of the light client committee votes this is a ~20x optimisation.

@JustinDrake
Copy link
Collaborator

JustinDrake commented Nov 15, 2020

Fine combed the document :)

  1. (bug fix) Called compute_shuffled_index with active_validator_count (as opposed to active_validator_indices).
  2. Used a Bitlist (as opposed to a Bitvector) for BeaconBlockBody.sync_committee_bits for consistency with SyncCommittee.pubkeys and SyncCommittee.compact_validators.
  3. Added assert len(body.sync_committee_bits) == len(committee_indices).
  4. Should current_sync_committee and next_sync_committee be initialised, e.g. in initialize_beacon_state_from_eth1?
  5. Should we sign over both previous_block_root and previous_slot? This may facilitate gossipping sync committee signatures as well as simplify and optimise light client implementations.
  6. Should we rotate current_sync_committee and next_sync_committee only if the previous rotation falls under finality? If not, how are long-range (greater than EPOCHS_PER_SYNC_COMMITTEE_PERIOD) forks handled?
  7. Should sync committees explicitly sign over state.finalized_checkpoint? This would simplify light client implementations and save them from downloading and verifying unnecessary Merkle branches.
  8. What do you suggest as the precise honest behaviour of a sync committee member? Explicitly spelling it out would help me go through edge cases.
  9. Should we consider adding sync committee slashing conditions? For example, we don't want sync committtee members equivocating over state.finalized_checkpoint.
  10. In the exceptional case where active_validator_count < MAX_SYNC_COMMITTEE_SIZE, what do you think of "padding" the sync committee with duplicate validators? This handling of the exceptional case (which really ought to never happen on mainnet) would simplify light client implementations because the variable-size Bitlist and List would be fixed-sized Bitvector and Vector. The state transition function would also be simplified a bit.
  11. What do you think of increasing MAX_SYNC_COMMITTEE_SIZE further, e.g. to 512 or 1024? The rationale is that the marginal on-chain overhead is negligeable (especially with fancy batched point addition) and light clients could choose to increase their committee size to improve robustness against adaptive attacks (as well as improve liveness in the worst case by shielding themselves against sampling variance). It also allows fancier light clients that sample random subsets of the sync committee, again for robustness. This does go against the pubkeys_aggregate optimisation.
  12. Does the micro-incentivisation break down when the networking latency is poor enough that it takes more than one slot for the block proposer to gather a good sync committee signature? What about relaxing the latency for the purpose of micro-incentivisation? (Note that sync committee signatures that don't make it onchain would still be valuable offchain for light clients.)
  13. Do the cosmetic changes look good to you? (I may have introduced a bug or two.)

@vbuterin
Copy link
Contributor Author

Should we rotate current_sync_committee and next_sync_committee only if the previous rotation falls under finality? If not, how are long-range (greater than EPOCHS_PER_SYNC_COMMITTEE_PERIOD) forks handled?

I'm inclined to say, no need for special cases here, if none of the committees in a given period get up to 2/3 the client should just pick the block that has the most light client participants. Obviously in such a case the client would wait for some time to verify that no better block is available.

Should we sign over....

What's wrong with just having the light client proof package that gets sent over the wire contain SSZ branches? Seems more general-purpose...

What do you suggest as the precise honest behaviour of a sync committee member?

Halfway through each slot, publish an attestation to what you think is the the head of the chain.

Should we consider adding sync committee slashing conditions?

Hmm... I'm inclined to say don't bother, because if the committee is faulty then nothing stops them from feeding clients the wrong chain consistently, and there's no way for the sync committee to have that high a level of security anyway.

In the exceptional case where active_validator_count < MAX_SYNC_COMMITTEE_SIZE, what do you think of "padding" the sync committee with duplicate validators?

Hmm.... instinctively it seems wasteful and unnatural. Is it really that complicated to have variable size lists? I guess another option would be to have them be vectors and pad the pubkeys and bits with zeroes to represent that there's nothing there.

What do you think of increasing MAX_SYNC_COMMITTEE_SIZE further, e.g. to 512 or 1024?

Seems okay as long as light clients have a choice to only use part of the committee if they so desire, so eg. sum_of_pubkeys would need to be extended into an array that contains the sum of the first 2**k for each k. That said, are we that worried about a committee of size 256 being attacked?

Does the micro-incentivisation break down when the networking latency is poor enough that it takes more than one slot for the block proposer to gather a good sync committee signature?

Remember that the signatures would still be floating around the gossip net even if only a few of them make it onto the chain, so technically as long as there's even a small percent change of being included the incentives to publish should work fine. But if we want to make it easier to read the signatures from just the chain, I guess one route would be to keep retrying a slot if it gets <50% participation (eg. for a maximum of 4 retries).

@vbuterin
Copy link
Contributor Author

Cosmetic changes look good to me!

Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbuterin

I made the linter pass: #2133

specs/lightclient/beacon-chain.md Show resolved Hide resolved
specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved
@JustinDrake
Copy link
Collaborator

I'm inclined to say, no need for special cases here, if none of the committees in a given period get up to 2/3 the client should just pick the block that has the most light client participants. Obviously in such a case the client would wait for some time to verify that no better block is available.

What about safety?

Let's say there's a 1/3 attacker, 1/3 honest attesting (and light-client voting) for fork A, 1/3 honest attesting (and light-client voting) for fork B.

Now the attacker attests for fork A and light-client votes for fork B. Fork A finalises but light clients are stuck on fork B.

What's wrong with just having the light client proof package that gets sent over the wire contain SSZ branches?

Light client implementations will often be hyper-secure and/or hyper-optimised. For example, consider a light client implementation in a zkSNARK circuit. Or a light client implementation in a metered virtual machine like the EVM. Or a light client implementation in a secure enclave. Or a light client implementation in a formal verification framework. The implementation costs are orders of magnitude higher than Python.

Avoiding unnecessary data like Merkle branches and as well unnecessary complexities like variable-sized Merkle branches may pay off big time.

Seems more general-purpose

Having just the previous root is definitely more minimalist :)

instinctively it seems wasteful

Do you mean wasteful when there are less than 256 active validators? Optimising for that edge case which should never happen in practice feels like a micro-optimisation.

are we that worried about a committee of size 256 being attacked?

I'm worried! A single committee break simultaneously breaks every Eth2-to-X bridge (where X = Cosmos, Near, Polkadot, etc.) and there could easily be $1B+ (or $10B+) in extractable value for an attacker.

Because EPOCHS_PER_SYNC_COMMITTEE_PERIOD in on the order of 1 day it has "human scale". By that I mean that 1 day is sufficient for an attacker to 1) publicise a $1m-per-validator bribing attack (e.g. on Twitter and Reddit) and 2) have corrupted validators manually perform an action to claim their $1m bribe. How many rational validators would say no to a $1m bribe? Also note that the attack cost is zero because there is no enshrined slashing.

@vbuterin
Copy link
Contributor Author

Avoiding unnecessary data like Merkle branches and as well unnecessary complexities like variable-sized Merkle branches may pay off big time.

We don't have any variable-sized Merkle branches though; SSZ was deliberately designed to make any access path neatly correspond to a single generalized index. I'd also add that if you want to know the state root, usually it's not because you care about the root, it's because you care about something in the state, so you would need Merkle paths anyway.

Let's say there's a 1/3 attacker, 1/3 honest attesting (and light-client voting) for fork A, 1/3 honest attesting (and light-client voting) for fork B.

Why would this situation persist for an entire day? Wouldn't the fork resolve and there be a block that has 2/3 voting?

To be clear, the fork choice rule I am proposing is not slot-by-slot. It's "start from the last block you are confident about, take the validator set from it, then find the descendant of that block that was signed by the most participants of that validator set, and repeat"

Do you mean wasteful when there are less than 256 active validators? Optimising for that edge case which should never happen in practice feels like a micro-optimisation.

I mean wasteful code-wise. It feels like it would still lead to annoying complexity (eg. where else in the code can the same validator be part of a committee twice??!). If we want fixed size the better path does seem to be to pad the committee with fake pubkeys that no one can sign with.

Because EPOCHS_PER_SYNC_COMMITTEE_PERIOD in on the order of 1 day it has "human scale". By that I mean that 1 day is sufficient for an attacker to 1) publicise a $1m-per-validator bribing attack (e.g. on Twitter and Reddit) and 2) have corrupted validators manually perform an action to claim their $1m bribe. How many rational validators would say no to a $1m bribe? Also note that the attack cost is zero because there is no enshrined slashing.

OK fair! I'm okay with pushing the committee size up as long as the pubkeys_aggregate is either removed or turns into a list of prefix-sums of powers of two.

On second thought, I'd be ok with some kind of slashing if we can figure out the right conditions; I guess something like "if you vote for block X but in the same epoch you used a block that conflicts with X as a last-justified-block you get slashed" could work. Main challenge is that we'd like to make sure that it's easy for anti-slash DBs to verify that a validator isn't slashing themselves.


## Light client state updates

The state of a light client is stored in a `memory` object of type `LightClientMemory`. To advance its state a light client requests an `update` object of type `LightClientUpdate` from the network by sending a request containing `(memory.shard, memory.header.slot, slot_range_end)`. It calls `validate_update(memory, update)` on each update that it receives in response. If `sum(update.aggregate_bits) * 3 > len(update.aggregate_bits) * 2` for any valid update, it accepts that update immediately; otherwise, it waits around for some time and then finally calls `update_memory(memory, update)` on the valid update with the highest `sum(update.aggregate_bits)`.
Copy link
Member

@ralexstokes ralexstokes Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sum(update.aggregate_bits) * 3 > len(update.aggregate_bits) * 2 for any valid update

is there a reason we are not summing effective stake (via effective_balance found in the appropriate SyncCommittee in memory)?

relying on the bits is a good approximation... until it is not and the edge case results in committee corruption

Copy link
Contributor Author

@vbuterin vbuterin Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It vastly simplifies the light client spec to rely only on the bits. We can make up for it by making membership in the sync committee itself probabilistic and balance-dependent.


# Verify signature
active_pubkeys = [p for (bit, p) in zip(update.aggregation_bits, committee.pubkeys) if bit]
domain = compute_domain(DOMAIN_SYNC_COMMITTEE, memory.version)
Copy link
Member

@ralexstokes ralexstokes Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is worth considering adding the genesis_validators_root as input to compute_domain for the same reasons we have it on the beacon chain.

it only adds a constant 32 byte overhead for the light client (which can even be hard-coded into client code) and reduces the scope of admissible updates a light client would even bother with

happy to make a PR on top of this one adding the changes if there is support for the change

Comment on lines +39 to +40
| `FINALIZED_ROOT_INDEX` | `Index(BeaconState, 'finalized_checkpoint', 'root')` |
| `NEXT_SYNC_COMMITTEE_INDEX` | `Index(BeaconState, 'next_sync_committee')` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these Index helpers equal get_generalized_index in https://github.com/ethereum/eth2.0-specs/blob/5f9112ad4227d12ea03c001517d53518e6e355f0/ssz/merkle-proofs.md?

If so, we may need to either make merkle-proofs.md executable again or implement it in remerkleable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these Index helpers equal get_generalized_index

Yes :)

we may need to either make merkle-proofs.md executable again or implement it in remerkleable

When we do so I suggest shortening "generalized_index" to something like "node_index", "tree_index", "Merkle_index", or just "index" for short.

# Verify update header root is the finalized root of the finality header, if specified
if update.finality_header == BeaconBlockHeader():
signed_header = update.header
assert update.finality_branch == [ZERO_HASH for _ in range(log2(FINALIZED_ROOT_INDEX))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we need to add a uint64 casting to all these log2, or reuse get_generalized_index_length.

@dankrad
Copy link
Contributor

dankrad commented Nov 26, 2020

My main concern here is what happens when a committee does not achieve a 2/3 vote during EPOCHS_PER_SYNC_COMMITTEE_PERIOD. Realistically, blockchain/smart contract light clients will only use 2/3 links because everything else is too dangerous. But when a sync committees tenure ends, there is no incentive to keep trying to get one of the votes over the threshold.

Intuitively I would have said sync committees should only be switched when the next sync committee has been confirmed by a 2/3 vote. But that of course can lead to terrible situations where a sync committee intentionally withholds confirming the next one.

One idea would be that there is a mechanism to record the "success" of each sync committee to achieve 2/3, and if it never reached it, a final vote on the last block can be included at any time for some reward.

@dankrad
Copy link
Contributor

dankrad commented Nov 26, 2020

Further remark in a similar direction: Intuitively, the previous sync committee has almost the same "validity" in epoch EPOCHS_PER_SYNC_COMMITTEE_PERIOD and EPOCHS_PER_SYNC_COMMITTEE_PERIOD+1 (after committees have switched). What about giving the old committee the opportunity for one, and only one, more vote? The nice thing is that they could vote for the "next next" sync committee, effectively allowing to skip one committee at a time. Also, in case the new committee is poor and doesn't come to a consensus, it gives one opportunity for the old committee to create a 2/3+1 link.

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a quick pass

specs/lightclient/beacon-chain.md Show resolved Hide resolved
```python
class BeaconBlockBody(phase0.BeaconBlockBody):
# Sync committee aggregate signature
sync_committee_bits: Bitvector[SYNC_COMMITTEE_SIZE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider putting this in the outer BeaconBlock/BeaconBlockHeader container(s). This would allow for sync to occur with just the BeaconBlockHeader and no additional proofs.

Seems reasonable to elevate this to baseline verification (similar to the proposer signature) rather than the in the payload of the block. Otherwise, sync via this light mechanism will always require BeaconBlockHeader plus a small sync_signature proof into the body

active_validator_count = uint64(len(active_validator_indices))
seed = get_seed(state, base_epoch, DOMAIN_SYNC_COMMITTEE)
i, sync_committee_indices = 0, []
while len(sync_committee_indices) < SYNC_COMMITTEE_SIZE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we create a helper compute_weighted_committee or something to (1) make it clear why we aren't using previous committee shuffling algorithms, and (2) to allow for better code reuse in the event we want to reuse or just directly test this functionality

specs/lightclient/beacon-chain.md Show resolved Hide resolved
specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved
specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved
specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved
```python
def process_sync_committee(state: BeaconState, body: BeaconBlockBody) -> None:
# Verify sync committee aggregate signature signing over the previous slot block root
previous_slot = max(state.slot, Slot(1)) - Slot(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a get_previous_slot accessor like get_previous_epoch

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved
specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved
specs/lightclient/beacon-chain.md Show resolved Hide resolved
specs/lightclient/beacon-chain.md Show resolved Hide resolved
# Reward sync committee participants
participant_rewards = Gwei(0)
active_validator_count = uint64(len(get_active_validator_indices(state, get_current_epoch(state))))
for participant_index in participant_indices:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we exclude slashed validators for rewards?


```python
def process_block(state: BeaconState, block: BeaconBlock) -> None:
phase0.process_block(state, block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that by phase0.process_block, it uses the phase 0 package.

  1. The constants and the function calls process_block would be phase 0 version. It may cause some side effects that we may easily overlook.
  2. You are passing (lightclient_patch.state, lightclient_patch.block) to process_block(state: phase0.BeaconState, block: phase0.BeaconBlock).

Saving 3 lines probably is not worth it. 😅

@djrtwo
Copy link
Contributor

djrtwo commented Dec 7, 2020

Saving 3 lines probably is not worth it. 😅

Yes! I would tend to agree

* Bump remerkleable to 0.1.18

* Disable `sync-protocol.md` for now. Make linter pass

* Enable lightclient tests

* Use *new* `optional_fast_aggregate_verify`

* Fix ToC and codespell

* Do not run phase1 tests with Lightclient patch

* Fix the Eth1Data casting bug. Add a workaround.

* Fix `run_on_attestation` testing helper

* Revert

* Rename `optional_fast_aggregate_verify` to `eth2_fast_aggregate_verify`

* Apply Proto's suggestion

* Apply Danny's suggestion

* Fixing tests

* Fix after rebasing

* Rename `LIGHTCLIENT` -> `LIGHTCLIENT_PATCH`

* New doctoc

* Add lightclient patch configs

* fix gitignore light client patch generator output

* Upgrade state for light client patch

* Add `lightclient-fork.md` to deal the fork boundary and fix
`process_block_header`

* Misc cleanups

1) Add a summary note for every function that is changed.
2) Avoid changing `process_block` (instead only change `process_block_header`).
3) Rename `G2_INFINITY_POINT_SIG` to `G2_POINT_AT_INFINITY` to avoid `SIG` contraction.
4) Misc cleanups

* Update block.py

* Update beacon-chain.md

* Fix typo "minimal" -> "mainnet"

Co-authored-by: Marin Petrunić <[email protected]>

* Use the new `BeaconBlockHeader` instead of phase 0 version

* Update config files

* Move `sync_committee_bits` and `sync_committee_signature` back to `BeaconBlockBody`

Co-authored-by: protolambda <[email protected]>
Co-authored-by: Justin <[email protected]>
Co-authored-by: Marin Petrunić <[email protected]>
Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving the initial version a green light. The tests & iterations will be added with other PRs.

@djrtwo
Copy link
Contributor

djrtwo commented Dec 17, 2020

Added some sanity tests. merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants