Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up light client spec #895

Merged
merged 9 commits into from
Apr 18, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 27 additions & 34 deletions specs/light_client/merkle_proofs.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ In a binary Merkle tree, we define a "generalized index" of a node as `2**depth
Note that the generalized index has the convenient property that the two children of node `k` are `2k` and `2k+1`, and also that it equals the position of a node in the linear representation of the Merkle tree that's computed by this function:

```python
def merkle_tree(leaves):
def merkle_tree(leaves: List[Bytes32]) -> List[Bytes32]:
o = [0] * len(leaves) + leaves
for i in range(len(leaves)-1, 0, -1):
o[i] = hash(o[i*2] + o[i*2+1])
for i in range(len(leaves) - 1, 0, -1):
o[i] = hash(o[i * 2] + o[i * 2 + 1])
return o
```

Expand All @@ -47,33 +47,33 @@ y_data_root len(y)
We can now define a concept of a "path", a way of describing a function that takes as input an SSZ object and outputs some specific (possibly deeply nested) member. For example, `foo -> foo.x` is a path, as are `foo -> len(foo.y)` and `foo -> foo.y[5].w`. We'll describe paths as lists, which can have two representations. In "human-readable form", they are `["x"]`, `["y", "__len__"]` and `["y", 5, "w"]` respectively. In "encoded form", they are lists of `uint64` values, in these cases (assuming the fields of `foo` in order are `x` then `y`, and `w` is the first field of `y[i]`) `[0]`, `[1, 2**64-1]`, `[1, 5, 0]`.

```python
def path_to_encoded_form(obj: Any, path: List[str or int]) -> List[int]:
def path_to_encoded_form(obj: Any, path: List[Union[str, int]]) -> List[int]:
if len(path) == 0:
return []
if isinstance(path[0], "__len__"):
elif isinstance(path[0], "__len__"):
assert len(path) == 1
return [LENGTH_FLAG]
elif isinstance(path[0], str) and hasattr(obj, "fields"):
return [list(obj.fields.keys()).index(path[0])] + path_to_encoded_form(getattr(obj, path[0]), path[1:])
elif isinstance(obj, (StaticList, DynamicList)):
elif isinstance(obj, (Vector, List)):
return [path[0]] + path_to_encoded_form(obj[path[0]], path[1:])
else:
raise Exception("Unknown type / path")
```

We can now define a function `get_generalized_indices(object: Any, path: List[int], root=1: int) -> int` that converts an object and a path to a set of generalized indices (note that for constant-sized objects, there is only one generalized index and it only depends on the path, but for dynamically sized objects the indices may depend on the object itself too). For dynamically-sized objects, the set of indices will have more than one member because of the need to access an array's length to determine the correct generalized index for some array access.
We can now define a function `get_generalized_indices(object: Any, path: List[int], root: int=1) -> List[int]` that converts an object and a path to a set of generalized indices (note that for constant-sized objects, there is only one generalized index and it only depends on the path, but for dynamically sized objects the indices may depend on the object itself too). For dynamically-sized objects, the set of indices will have more than one member because of the need to access an array's length to determine the correct generalized index for some array access.

```python
def get_generalized_indices(obj: Any, path: List[int], root=1) -> List[int]:
def get_generalized_indices(obj: Any, path: List[int], root: int=1) -> List[int]:
if len(path) == 0:
return [root]
elif isinstance(obj, StaticList):
elif isinstance(obj, Vector):
items_per_chunk = (32 // len(serialize(x))) if isinstance(x, int) else 1
new_root = root * next_power_of_2(len(obj) // items_per_chunk) + path[0] // items_per_chunk
return get_generalized_indices(obj[path[0]], path[1:], new_root)
elif isinstance(obj, DynamicList) and path[0] == LENGTH_FLAG:
elif isinstance(obj, List) and path[0] == LENGTH_FLAG:
return [root * 2 + 1]
elif isinstance(obj, DynamicList) and isinstance(path[0], int):
elif isinstance(obj, List) and isinstance(path[0], int):
assert path[0] < len(obj)
items_per_chunk = (32 // len(serialize(x))) if isinstance(x, int) else 1
new_root = root * 2 * next_power_of_2(len(obj) // items_per_chunk) + path[0] // items_per_chunk
Expand All @@ -99,27 +99,20 @@ x x . . . . x *

. are unused nodes, * are used nodes, x are the values we are trying to prove. Notice how despite being a multiproof for 3 values, it requires only 3 auxiliary nodes, only one node more than would be required to prove a single value. Normally the efficiency gains are not quite that extreme, but the savings relative to individual Merkle proofs are still significant. As a rule of thumb, a multiproof for k nodes at the same level of an n-node tree has size `k * (n/k + log(n/k))`.

Here is code for creating and verifying a multiproof. First a helper:

```python
def log2(x):
return 0 if x == 1 else 1 + log2(x//2)
```

First, a method for computing the generalized indices of the auxiliary tree nodes that a proof of a given set of generalized indices will require:
Here is code for creating and verifying a multiproof. First, a method for computing the generalized indices of the auxiliary tree nodes that a proof of a given set of generalized indices will require:

```python
def get_proof_indices(tree_indices: List[int]) -> List[int]:
# Get all indices touched by the proof
maximal_indices = set({})
maximal_indices = set()
for i in tree_indices:
x = i
while x > 1:
maximal_indices.add(x ^ 1)
x //= 2
maximal_indices = tree_indices + sorted(list(maximal_indices))[::-1]
# Get indices that cannot be recalculated from earlier indices
redundant_indices = set({})
redundant_indices = set()
proof = []
for index in maximal_indices:
if index not in redundant_indices:
Expand All @@ -130,26 +123,26 @@ def get_proof_indices(tree_indices: List[int]) -> List[int]:
break
index //= 2
return [i for i in proof if i not in tree_indices]
````
```

Generating a proof is simply a matter of taking the node of the SSZ hash tree with the union of the given generalized indices for each index given by `get_proof_indices`, and outputting the list of nodes in the same order.

Here is the verification function:

```python
def verify_multi_proof(root, indices, leaves, proof):
def verify_multi_proof(root: Bytes32, indices: List[int], leaves: List[Bytes32], proof: List[bytes]) -> bool:
hwwhww marked this conversation as resolved.
Show resolved Hide resolved
tree = {}
for index, leaf in zip(indices, leaves):
tree[index] = leaf
for index, proofitem in zip(get_proof_indices(indices), proof):
tree[index] = proofitem
indexqueue = sorted(tree.keys())[:-1]
for index, proof_item in zip(get_proof_indices(indices), proof):
tree[index] = proof_item
index_queue = sorted(tree.keys())[:-1]
i = 0
while i < len(indexqueue):
index = indexqueue[i]
if index >= 2 and index^1 in tree:
tree[index//2] = hash(tree[index - index%2] + tree[index - index%2 + 1])
indexqueue.append(index//2)
while i < len(index_queue):
index = index_queue[i]
if index >= 2 and index ^ 1 in tree:
tree[index // 2] = hash(tree[index - index % 2] + tree[index - index % 2 + 1])
index_queue.append(index // 2)
i += 1
return (indices == []) or (1 in tree and tree[1] == root)
```
Expand All @@ -158,7 +151,7 @@ def verify_multi_proof(root, indices, leaves, proof):

We define:

#### `MerklePartial`
#### `SSZMerklePartial`


```python
Expand All @@ -172,6 +165,6 @@ We define:

#### Proofs for execution

We define `MerklePartial(f, arg1, arg2..., focus=0)` as being a `MerklePartial` object wrapping a Merkle multiproof of the set of nodes in the hash tree of the SSZ object `arg[focus]` that is needed to authenticate the parts of the object needed to compute `f(arg1, arg2...)`.
We define `MerklePartial(f, arg1, arg2..., focus=0)` as being a `SSZMerklePartial` object wrapping a Merkle multiproof of the set of nodes in the hash tree of the SSZ object `arg[focus]` that is needed to authenticate the parts of the object needed to compute `f(arg1, arg2...)`.

Ideally, any function which accepts an SSZ object should also be able to accept a `MerklePartial` object as a substitute.
Ideally, any function which accepts an SSZ object should also be able to accept a `SSZMerklePartial` object as a substitute.
39 changes: 21 additions & 18 deletions specs/light_client/sync_protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,19 @@ __NOTICE__: This document is a work-in-progress for researchers and implementers
## Table of Contents

<!-- TOC -->

- [Beacon Chain Light Client Syncing](#beacon-chain-light-client-syncing)
- [Table of Contents](#table-of-contents)
- [Light client state](#light-client-state)
- [Updating the shuffled committee](#updating-the-shuffled-committee)
- [Computing the current committee](#computing-the-current-committee)
- [Verifying blocks](#verifying-blocks)
- [Preliminaries](#preliminaries)
- [Light client state](#light-client-state)
- [Updating the shuffled committee](#updating-the-shuffled-committee)
- [Computing the current committee](#computing-the-current-committee)
- [Verifying blocks](#verifying-blocks)

<!-- /TOC -->


### Preliminaries
## Preliminaries

We define an "expansion" of an object as an object where a field in an object that is meant to represent the `hash_tree_root` of another object is replaced by the object. Note that defining expansions is not a consensus-layer-change; it is merely a "re-interpretation" of the object. Particularly, the `hash_tree_root` of an expansion of an object is identical to that of the original object, and we can define expansions where, given a complete history, it is always possible to compute the expansion of any object in the history. The opposite of an expansion is a "summary" (eg. `BeaconBlockHeader` is a summary of `BeaconBlock`).

Expand All @@ -40,7 +43,7 @@ We add a data type `PeriodData` and four helpers:
{
'validator_count': 'uint64',
'seed': 'bytes32',
'committee': [Validator]
'committee': [Validator],
}
```

Expand Down Expand Up @@ -85,14 +88,13 @@ later_period_data = get_period_data(new_committee_proof, finalized_header, shard

The maximum size of a proof is `128 * ((22-7) * 32 + 110) = 75520` bytes for validator records and `(22-7) * 32 + 128 * 8 = 1504` for the active index proof (much smaller because the relevant active indices are all beside each other in the Merkle tree). This needs to be done once per `PERSISTENT_COMMITTEE_PERIOD` epochs (2048 epochs / 9 days), or ~38 bytes per epoch.

### Computing the current committee
## Computing the current committee

Here is a helper to compute the committee at a slot given the maximal earlier and later committees:

```python
def compute_committee(header: BeaconBlockHeader,
validator_memory: ValidatorMemory):

validator_memory: ValidatorMemory) -> List[ValidatorIndex]:
earlier_validator_count = validator_memory.earlier_period_data.validator_count
later_validator_count = validator_memory.later_period_data.validator_count
maximal_earlier_committee = validator_memory.earlier_period_data.committee
Expand All @@ -105,11 +107,13 @@ def compute_committee(header: BeaconBlockHeader,
earlier_validator_count // (SHARD_COUNT * TARGET_COMMITTEE_SIZE),
later_validator_count // (SHARD_COUNT * TARGET_COMMITTEE_SIZE),
) + 1

def get_offset(count, end:bool):
return get_split_offset(count,
SHARD_COUNT * committee_count,
validator_memory.shard_id * committee_count + (1 if end else 0))

def get_offset(count: int, end: bool) -> int:
return get_split_offset(
count,
SHARD_COUNT * committee_count,
validator_memory.shard_id * committee_count + (1 if end else 0),
)

actual_earlier_committee = maximal_earlier_committee[
0:get_offset(earlier_validator_count, True) - get_offset(earlier_validator_count, False)
Expand All @@ -129,21 +133,20 @@ def compute_committee(header: BeaconBlockHeader,
[i for i in earlier_committee if epoch % PERSISTENT_COMMITTEE_PERIOD < get_switchover_epoch(i)] +
[i for i in later_committee if epoch % PERSISTENT_COMMITTEE_PERIOD >= get_switchover_epoch(i)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbuterin
I think they should be:

        [i for i in actual_earlier_committee if epoch % PERSISTENT_COMMITTEE_PERIOD < get_switchover_epoch(i)] +
        [i for i in actual_later_committee if epoch % PERSISTENT_COMMITTEE_PERIOD >= get_switchover_epoch(i)]

?

)))

```

Note that this method makes use of the fact that the committee for any given shard always starts and ends at the same validator index independently of the committee count (this is because the validator set is split into `SHARD_COUNT * committee_count` slices but the first slice of a shard is a multiple `committee_count * i`, so the start of the slice is `n * committee_count * i // (SHARD_COUNT * committee_count) = n * i // SHARD_COUNT`, using the slightly nontrivial algebraic identity `(x * a) // ab == x // b`).

### Verifying blocks
## Verifying blocks

If a client wants to update its `finalized_header` it asks the network for a `BlockValidityProof`, which is simply:

```python
{
'header': BlockHeader,
'header': BeaconBlockHeader,
'shard_aggregate_signature': 'bytes96',
'shard_bitfield': 'bytes',
'shard_parent_block': ShardBlock
'shard_parent_block': ShardBlock,
}
```

Expand Down