From 4431c08e8bcf82ec78c146b1237292841b09506e Mon Sep 17 00:00:00 2001 From: Etan Kissling Date: Thu, 20 Apr 2023 16:23:43 +0200 Subject: [PATCH 1/5] EIP-6475: Add SSZ `Optional[T]` type Add specification for EIP-6475 support for SSZ. Remerkleable impl: https://eips.ethereum.org/assets/eip-6475/tests.py We could possibly change all planned usage of `Union` with `Optional`, and introduce the conceptually more complex `Union` once needed. `Optional` serialization can be more optimized than `Union`. Discussion: https://ethereum-magicians.org/t/eip-6475-ssz-optional/12891 This PR builds on prior work from: - @zah at https://github.com/ethereum/consensus-specs/issues/1916 --- ssz/simple-serialize.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/ssz/simple-serialize.md b/ssz/simple-serialize.md index 4ef64f2f28..15ee037be2 100644 --- a/ssz/simple-serialize.md +++ b/ssz/simple-serialize.md @@ -60,6 +60,8 @@ * notation `Bitvector[N]` * **bitlist**: ordered variable-length collection of `boolean` values, limited to `N` bits * notation `Bitlist[N]` +* **optional**: either a wrapped value of the given subtype, or `None` + * notation `Optional[type]`, e.g. `Optional[uint64]` * **union**: union type containing one of the given subtypes * notation `Union[type_0, type_1, ...]`, e.g. `union[None, uint64, uint32]` @@ -90,6 +92,7 @@ Assuming a helper function `default(type)` which returns the default value for ` | `Bitvector[N]` | `[False] * N` | | `List[type, N]` | `[]` | | `Bitlist[N]` | `[]` | +| `Optional[type]` | `None` | | `Union[type_0, type_1, ...]` | `default(type_0)` | #### `is_zero` @@ -100,6 +103,7 @@ An SSZ object is called zeroed (and thus, `is_zero(object)` returns true) if it - Empty vector types (`Vector[type, 0]`, `Bitvector[0]`) are illegal. - Containers with no fields are illegal. +- Optionals wrapping types that may serialize to `[]` (`List[type, N]`, nested `Optional`) are illegal. - The `None` type option in a `Union` type is only legal as the first option (i.e. with index zero). ## Serialization @@ -163,6 +167,15 @@ fixed_parts = [part if part != None else variable_offsets[i] for i, part in enum return b"".join(fixed_parts + variable_parts) ``` +### Optional + +```python +if value is None: + return b"" +else: + return serialize(value) +``` + ### Union A `value` as `Union[T...]` type has properties `value.value` with the contained value, and `value.selector` which indexes the selected `Union` type option `T`. @@ -196,6 +209,7 @@ Deserialization can be implemented using a recursive algorithm. The deserializat * The size of each object in the vector/list can be inferred from the difference of two offsets. To get the size of the last object, the total number of bytes has to be known (it is not generally possible to deserialize an SSZ object of unknown length) * Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects. * In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit at the end that is always set. This bit has to be used to infer the size of the bitlist in bits. +* In the case of optional, if the serialized data has length 0, it represents `None`. Otherwise, deserialize same as the underlying type. * In the case of unions, the first byte of the deserialization scope is deserialized as type selector, the remainder of the scope is deserialized as the selected type. Note that deserialization requires hardening against invalid inputs. A non-exhaustive list: @@ -244,6 +258,8 @@ We now define Merkleization `hash_tree_root(value)` of an object `value` recursi * `mix_in_length(merkleize(pack_bits(value), limit=chunk_count(type)), len(value))` if `value` is a bitlist. * `merkleize([hash_tree_root(element) for element in value])` if `value` is a vector of composite objects or a container. * `mix_in_length(merkleize([hash_tree_root(element) for element in value], limit=chunk_count(type)), len(value))` if `value` is a list of composite objects. +* `mix_in_length(hash_tree_root(value), 1)` if `value` is of optional type, and `value` is not `None` +* `mix_in_length(Bytes32(), 0)` if `value` is of optional type, and `value` is `None` * `mix_in_selector(hash_tree_root(value.value), value.selector)` if `value` is of union type, and `value.value` is not `None` * `mix_in_selector(Bytes32(), 0)` if `value` is of union type, and `value.value` is `None` From 4eaf27ba5d2e9d9009677a7eb5a413149be8c33b Mon Sep 17 00:00:00 2001 From: Etan Kissling Date: Thu, 20 Apr 2023 16:35:59 +0200 Subject: [PATCH 2/5] Fix table of contents --- ssz/simple-serialize.md | 1 + 1 file changed, 1 insertion(+) diff --git a/ssz/simple-serialize.md b/ssz/simple-serialize.md index 15ee037be2..0b81ad140a 100644 --- a/ssz/simple-serialize.md +++ b/ssz/simple-serialize.md @@ -20,6 +20,7 @@ - [`Bitvector[N]`](#bitvectorn) - [`Bitlist[N]`](#bitlistn) - [Vectors, containers, lists](#vectors-containers-lists) + - [Optional](#optional) - [Union](#union) - [Deserialization](#deserialization) - [Merkleization](#merkleization) From e92ef6c3cb4011bc2ba108a3242abb3e6ea7ccdb Mon Sep 17 00:00:00 2001 From: Etan Kissling Date: Thu, 27 Apr 2023 22:17:42 +0200 Subject: [PATCH 3/5] Update for latest `Optional` spec (`0x01` prefix in `Some` case) https://eips.ethereum.org/EIPS/eip-6475 https://github.com/ethereum/EIPs/pull/6945 --- ssz/simple-serialize.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/ssz/simple-serialize.md b/ssz/simple-serialize.md index 0b81ad140a..cef8f6d8c5 100644 --- a/ssz/simple-serialize.md +++ b/ssz/simple-serialize.md @@ -104,7 +104,6 @@ An SSZ object is called zeroed (and thus, `is_zero(object)` returns true) if it - Empty vector types (`Vector[type, 0]`, `Bitvector[0]`) are illegal. - Containers with no fields are illegal. -- Optionals wrapping types that may serialize to `[]` (`List[type, N]`, nested `Optional`) are illegal. - The `None` type option in a `Union` type is only legal as the first option (i.e. with index zero). ## Serialization @@ -174,7 +173,7 @@ return b"".join(fixed_parts + variable_parts) if value is None: return b"" else: - return serialize(value) + return b"\x01" + serialize(value) ``` ### Union @@ -210,7 +209,7 @@ Deserialization can be implemented using a recursive algorithm. The deserializat * The size of each object in the vector/list can be inferred from the difference of two offsets. To get the size of the last object, the total number of bytes has to be known (it is not generally possible to deserialize an SSZ object of unknown length) * Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects. * In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit at the end that is always set. This bit has to be used to infer the size of the bitlist in bits. -* In the case of optional, if the serialized data has length 0, it represents `None`. Otherwise, deserialize same as the underlying type. +* In the case of optional, if the serialized data has length 0, it represents `None`. Otherwise, the first byte of the deserialization scope must be checked to be `0x01`, the remainder of the scope is deserialized same as `T`. * In the case of unions, the first byte of the deserialization scope is deserialized as type selector, the remainder of the scope is deserialized as the selected type. Note that deserialization requires hardening against invalid inputs. A non-exhaustive list: From d8c20746a9851869878c225fceb2d2262b76886d Mon Sep 17 00:00:00 2001 From: Etan Kissling Date: Thu, 27 Apr 2023 22:20:45 +0200 Subject: [PATCH 4/5] Align union `None` serialization with `Optional` https://ethereum-magicians.org/t/eip-6475-ssz-optional/12891/17 --- ssz/simple-serialize.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ssz/simple-serialize.md b/ssz/simple-serialize.md index cef8f6d8c5..f15625114a 100644 --- a/ssz/simple-serialize.md +++ b/ssz/simple-serialize.md @@ -191,7 +191,7 @@ A `Union`: ```python if value.value is None: assert value.selector == 0 - return b"\x00" + return b"" else: serialized_bytes = serialize(value.value) serialized_selector_index = value.selector.to_bytes(1, "little") @@ -210,14 +210,14 @@ Deserialization can be implemented using a recursive algorithm. The deserializat * Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects. * In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit at the end that is always set. This bit has to be used to infer the size of the bitlist in bits. * In the case of optional, if the serialized data has length 0, it represents `None`. Otherwise, the first byte of the deserialization scope must be checked to be `0x01`, the remainder of the scope is deserialized same as `T`. -* In the case of unions, the first byte of the deserialization scope is deserialized as type selector, the remainder of the scope is deserialized as the selected type. +* In the case of unions, if the serialized data has length 0, it represents `None`. Otherwise, the first byte of the deserialization scope is deserialized as type selector, the remainder of the scope is deserialized as the selected type (cannot refer to `None`). Note that deserialization requires hardening against invalid inputs. A non-exhaustive list: - Offsets: out of order, out of range, mismatching minimum element size. - Scope: Extra unused bytes, not aligned with element size. - More elements than a list limit allows. Part of enforcing consensus. -- An out-of-bounds selected index in an `Union` +- An out-of-bounds selected index in an `Union`, or a `None` value for a type that doesn't support it. Efficient algorithms for computing this object can be found in [the implementations](#implementations). From 83936710d1bd2cf050e01ad7df50de4a27ef268f Mon Sep 17 00:00:00 2001 From: Hsiao-Wei Wang Date: Mon, 1 May 2023 22:50:35 +0800 Subject: [PATCH 5/5] Rename Python typing `Optional` to `PyOptional` --- setup.py | 9 +++++---- specs/_features/das/das-core.md | 4 ++-- specs/altair/light-client/full-node.md | 2 +- specs/altair/light-client/sync-protocol.md | 2 +- specs/bellatrix/fork-choice.md | 4 ++-- specs/bellatrix/validator.md | 8 ++++---- specs/capella/fork-choice.md | 2 +- specs/capella/validator.md | 2 +- specs/phase0/validator.md | 2 +- 9 files changed, 18 insertions(+), 17 deletions(-) diff --git a/setup.py b/setup.py index fc3acb8062..e176b08e1d 100644 --- a/setup.py +++ b/setup.py @@ -378,7 +378,8 @@ def imports(cls, preset_name: str) -> str: field, ) from typing import ( - Any, Callable, Dict, Set, Sequence, Tuple, Optional, TypeVar, NamedTuple, Final + Any, Callable, Dict, Set, Sequence, Tuple, TypeVar, NamedTuple, Final, + Optional as PyOptional ) from eth2spec.utils.ssz.ssz_impl import hash_tree_root, copy, uint_to_bytes @@ -564,7 +565,7 @@ def sundry_functions(cls) -> str: ExecutionState = Any -def get_pow_block(hash: Bytes32) -> Optional[PowBlock]: +def get_pow_block(hash: Bytes32) -> PyOptional[PowBlock]: return PowBlock(block_hash=hash, parent_hash=Bytes32(), total_difficulty=uint256(0)) @@ -585,7 +586,7 @@ def notify_forkchoice_updated(self: ExecutionEngine, head_block_hash: Hash32, safe_block_hash: Hash32, finalized_block_hash: Hash32, - payload_attributes: Optional[PayloadAttributes]) -> Optional[PayloadId]: + payload_attributes: PyOptional[PayloadAttributes]) -> PyOptional[PayloadId]: pass def get_payload(self: ExecutionEngine, payload_id: PayloadId) -> ExecutionPayload: @@ -815,7 +816,7 @@ def combine_dicts(old_dict: Dict[str, T], new_dict: Dict[str, T]) -> Dict[str, T 'uint8', 'uint16', 'uint32', 'uint64', 'uint128', 'uint256', 'bytes', 'byte', 'ByteList', 'ByteVector', 'Dict', 'dict', 'field', 'ceillog2', 'floorlog2', 'Set', - 'Optional', 'Sequence', + 'PyOptional', 'Sequence', ] diff --git a/specs/_features/das/das-core.md b/specs/_features/das/das-core.md index f683cbbe13..fcd3ae83ba 100644 --- a/specs/_features/das/das-core.md +++ b/specs/_features/das/das-core.md @@ -105,7 +105,7 @@ Implementations: - [Old approach in Go](https://github.com/protolambda/go-kate/blob/master/recovery.go) ```python -def recover_data(data: Sequence[Optional[Sequence[Point]]]) -> Sequence[Point]: +def recover_data(data: Sequence[PyOptional[Sequence[Point]]]) -> Sequence[Point]: """Given an a subset of half or more of subgroup-aligned ranges of values, recover the None values.""" ... ``` @@ -183,7 +183,7 @@ def verify_sample(sample: DASSample, sample_count: uint64, commitment: BLSCommit ``` ```python -def reconstruct_extended_data(samples: Sequence[Optional[DASSample]]) -> Sequence[Point]: +def reconstruct_extended_data(samples: Sequence[PyOptional[DASSample]]) -> Sequence[Point]: # Instead of recovering with a point-by-point approach, recover the samples by recovering missing subgroups. subgroups = [None if sample is None else reverse_bit_order_list(sample.data) for sample in samples] return recover_data(subgroups) diff --git a/specs/altair/light-client/full-node.md b/specs/altair/light-client/full-node.md index 7dc25448c1..fb465b1248 100644 --- a/specs/altair/light-client/full-node.md +++ b/specs/altair/light-client/full-node.md @@ -97,7 +97,7 @@ def create_light_client_update(state: BeaconState, block: SignedBeaconBlock, attested_state: BeaconState, attested_block: SignedBeaconBlock, - finalized_block: Optional[SignedBeaconBlock]) -> LightClientUpdate: + finalized_block: PyOptional[SignedBeaconBlock]) -> LightClientUpdate: assert compute_epoch_at_slot(attested_state.slot) >= ALTAIR_FORK_EPOCH assert sum(block.message.body.sync_aggregate.sync_committee_bits) >= MIN_SYNC_COMMITTEE_PARTICIPANTS diff --git a/specs/altair/light-client/sync-protocol.md b/specs/altair/light-client/sync-protocol.md index baef684c62..e82a9035d0 100644 --- a/specs/altair/light-client/sync-protocol.md +++ b/specs/altair/light-client/sync-protocol.md @@ -152,7 +152,7 @@ class LightClientStore(object): current_sync_committee: SyncCommittee next_sync_committee: SyncCommittee # Best available header to switch finalized head to if we see nothing else - best_valid_update: Optional[LightClientUpdate] + best_valid_update: PyOptional[LightClientUpdate] # Most recent available reasonably-safe header optimistic_header: LightClientHeader # Max number of active participants in a sync committee (used to calculate safety threshold) diff --git a/specs/bellatrix/fork-choice.md b/specs/bellatrix/fork-choice.md index ed7d60a932..fff4dcced1 100644 --- a/specs/bellatrix/fork-choice.md +++ b/specs/bellatrix/fork-choice.md @@ -60,7 +60,7 @@ def notify_forkchoice_updated(self: ExecutionEngine, head_block_hash: Hash32, safe_block_hash: Hash32, finalized_block_hash: Hash32, - payload_attributes: Optional[PayloadAttributes]) -> Optional[PayloadId]: + payload_attributes: PyOptional[PayloadAttributes]) -> PyOptional[PayloadId]: ... ``` @@ -101,7 +101,7 @@ class PowBlock(Container): ### `get_pow_block` -Let `get_pow_block(block_hash: Hash32) -> Optional[PowBlock]` be the function that given the hash of the PoW block returns its data. +Let `get_pow_block(block_hash: Hash32) -> PyOptional[PowBlock]` be the function that given the hash of the PoW block returns its data. It may result in `None` if the requested block is not yet available. *Note*: The `eth_getBlockByHash` JSON-RPC method may be used to pull this information from an execution client. diff --git a/specs/bellatrix/validator.md b/specs/bellatrix/validator.md index a176d7534e..889cd7e593 100644 --- a/specs/bellatrix/validator.md +++ b/specs/bellatrix/validator.md @@ -39,7 +39,7 @@ Please see related Beacon Chain doc before continuing and use them as a referenc ### `get_pow_block_at_terminal_total_difficulty` ```python -def get_pow_block_at_terminal_total_difficulty(pow_chain: Dict[Hash32, PowBlock]) -> Optional[PowBlock]: +def get_pow_block_at_terminal_total_difficulty(pow_chain: Dict[Hash32, PowBlock]) -> PyOptional[PowBlock]: # `pow_chain` abstractly represents all blocks in the PoW chain for block in pow_chain.values(): block_reached_ttd = block.total_difficulty >= TERMINAL_TOTAL_DIFFICULTY @@ -58,7 +58,7 @@ def get_pow_block_at_terminal_total_difficulty(pow_chain: Dict[Hash32, PowBlock] ### `get_terminal_pow_block` ```python -def get_terminal_pow_block(pow_chain: Dict[Hash32, PowBlock]) -> Optional[PowBlock]: +def get_terminal_pow_block(pow_chain: Dict[Hash32, PowBlock]) -> PyOptional[PowBlock]: if TERMINAL_BLOCK_HASH != Hash32(): # Terminal block hash override takes precedence over terminal total difficulty if TERMINAL_BLOCK_HASH in pow_chain: @@ -122,7 +122,7 @@ def prepare_execution_payload(state: BeaconState, safe_block_hash: Hash32, finalized_block_hash: Hash32, suggested_fee_recipient: ExecutionAddress, - execution_engine: ExecutionEngine) -> Optional[PayloadId]: + execution_engine: ExecutionEngine) -> PyOptional[PayloadId]: if not is_merge_transition_complete(state): is_terminal_block_hash_set = TERMINAL_BLOCK_HASH != Hash32() is_activation_epoch_reached = get_current_epoch(state) >= TERMINAL_BLOCK_HASH_ACTIVATION_EPOCH @@ -157,7 +157,7 @@ def prepare_execution_payload(state: BeaconState, 2. Set `block.body.execution_payload = get_execution_payload(payload_id, execution_engine)`, where: ```python -def get_execution_payload(payload_id: Optional[PayloadId], execution_engine: ExecutionEngine) -> ExecutionPayload: +def get_execution_payload(payload_id: PyOptional[PayloadId], execution_engine: ExecutionEngine) -> ExecutionPayload: if payload_id is None: # Pre-merge, empty payload return ExecutionPayload() diff --git a/specs/capella/fork-choice.md b/specs/capella/fork-choice.md index 0e0a393c34..9e8cb83dc3 100644 --- a/specs/capella/fork-choice.md +++ b/specs/capella/fork-choice.md @@ -42,7 +42,7 @@ def notify_forkchoice_updated(self: ExecutionEngine, head_block_hash: Hash32, safe_block_hash: Hash32, finalized_block_hash: Hash32, - payload_attributes: Optional[PayloadAttributes]) -> Optional[PayloadId]: + payload_attributes: PyOptional[PayloadAttributes]) -> PyOptional[PayloadId]: ... ``` diff --git a/specs/capella/validator.md b/specs/capella/validator.md index 644ee476f9..f6421c3aef 100644 --- a/specs/capella/validator.md +++ b/specs/capella/validator.md @@ -73,7 +73,7 @@ def prepare_execution_payload(state: BeaconState, safe_block_hash: Hash32, finalized_block_hash: Hash32, suggested_fee_recipient: ExecutionAddress, - execution_engine: ExecutionEngine) -> Optional[PayloadId]: + execution_engine: ExecutionEngine) -> PyOptional[PayloadId]: if not is_merge_transition_complete(state): is_terminal_block_hash_set = TERMINAL_BLOCK_HASH != Hash32() is_activation_epoch_reached = get_current_epoch(state) >= TERMINAL_BLOCK_HASH_ACTIVATION_EPOCH diff --git a/specs/phase0/validator.md b/specs/phase0/validator.md index 2a4d5b920e..8ef064374f 100644 --- a/specs/phase0/validator.md +++ b/specs/phase0/validator.md @@ -215,7 +215,7 @@ A validator can get committee assignments for a given epoch using the following def get_committee_assignment(state: BeaconState, epoch: Epoch, validator_index: ValidatorIndex - ) -> Optional[Tuple[Sequence[ValidatorIndex], CommitteeIndex, Slot]]: + ) -> PyOptional[Tuple[Sequence[ValidatorIndex], CommitteeIndex, Slot]]: """ Return the committee assignment in the ``epoch`` for ``validator_index``. ``assignment`` returned is a tuple of the following form: