cip	title	author	discussions-to	status	type	category	created	license
22	Upgrade epoch SNARK data for flexibility, slashability and security	Kobi Gurkan <me@kobi.one>	#72	Final	Standards	Ring 0	2020-10-30	Apache 2.0

Terminology

The key words "MUST", "MUST NOT", "SHOULD", and "MAY" in this document are to be interpreted as described in RFC 2119.

We mention little-endian and big-endian both in the context of bytes and bits. In both contexts we mean for little-endian that it's encoded from the least-significant element to the most-significant one, and similarly for big-endian.

The symbol || means array concatenation.

Simple Summary

This CIP upgrades the epoch SNARK data structure, which is signed by validators during consensus:

To be correct when there are less than a specified maximum amount of validators elected.
To be slashable with reasonable costs on-chain.

Abstract

Celo supports the Plumo ultra-light client protocol, where, instead of downloading and verifying many block headers, light clients can download SNARK proofs that update their known validator set to a more recent one. This is facilitated by an "epoch SNARK data" structure which is a SNARK-friendly encoding of validator election results. This structure is signed by validators during consensus and a resulting aggregated BLS signature is stored in blocks following elections.

The currently deployed version of the encoding can be improved on several fronts:

The encoding only works correctly when the validator set is at its maximum size. Otherwise a shorter encoding is output and we can't create a SNARK proof about it.
As it's designed to be SNARK-friendly, the encoding uses the Bowe-Hopwood SNARK-friendly hash function on the entire elected validator set. As a result, showing a double-signing proof for slashing currently requires hashing the entire elected validator set with a non EVM-friendly hash, leading to gas usage that is too high.
It's vulnerable to a "future subcommittee" attack discovered by @psivesely, where an attacker that compromises enough validators can pre-sign encodings with future epoch numbers and use them when their compromised subcommittee is elected in the future.

This CIP upgrades the encoding to address all three topics.

Motivation

Support validator sets which are not full up to a specified maximum.
Create the necessary infrastructure for on-chain slashing in the case of double-signing.
Prevent the "future subcommittee" attack described above

Specification

We define a few helper functions:

encode_integer_X_le - encodes an unsigned_integer having X bits to bits in a little-endian order.
encode_public_key_le - encodes a BLS12-377 point to a compressed form in bits - the little-endian bits of the x coordinate and a single bit whether y is the greatest of the two possibilities.
bits_be_to_bytes_le - converts big-endian bits to little-endian bytes.
bytes_le_to_bits_be - converts little-endian bytes to big-endian bits.
bytes_le_to_bits_le - converts little-endian bytes to little-endian bits.
BHHash - Bowe-Hopwood hash, as specified in the Zcash protocol specification, on the Edwards curve defined over the same base field as BLS12-377, as specified in the Zexe paper.
Blake2XsOuter and Blake2XsPlumo - the outer hashes in Blake2s-based variant of Blake2X, defined in the Blake2X paper. By outer hashes we mean all the calls to Blake2s beyond the root hash. Specifically, we define Blake2XsOuter(i, msg) to be

(Blake2s {
    hash_length = 32, 
    max_leaf_length = 32, 
    inner_hash_length = 32, 
    fanout = 0, 
    max_depth = 0, 
    personalization = "ULforxof", 
    node_offset=i,
    xof_digest_length = 64,
})(msg)

and Blake2XsPlumo(msg) to be

Blake2XsOuter(0, msg) || Blake2XsOuter(1, msg)

pad(array, n, padding_element) - enlarges the array to size n, adding padding_element in the new places.
truncate(array, n) - truncates the array to the first n elements.
encode_to_point(bytes) - attempt to parse the bytes as a BLS12-377 point as follows. Take the first 377 bits and attempt to build a valid base-field element as the x coordinate. If the resulting number is larger or equal to the modulus, fail. Then use the curve equation to derive one of the possible y coordinates for x as y = ±sqrt(x^3+1). If the intermediate value x^3+1 doesn't have a square root, fail. Then use the next bit to determine whether to take the greatest y - if the bit is 1, take the greatest, otherwise take the other option. Denote the point as P. Then multiply P by the cofactor 30631250834960419227450344600217059328 to get Q and check whether you get zero. If yes, fail. Finally, return the point Q.

The encoding is currently implemented as:

encode_to_point(
    Blake2XsPlumo(
        BHHash(
            encode_integer_8_le(counter) || 
            encode_integer_16_le(epoch number) || 
            encode_integer_32_le(maximum_non_signers) || 
            validator set
        )
    )
)

where counter is the first number where encode_to_point succeeds starting from 0 and at most 255, validator_set is the concatenation of the results of encode_public_key_le on each validator public key.

We propose to change it to the following. We first define the parameters MAX_VALIDATORS = 150 and SEC_PARAM = 128. We then specify the new encoding.

encode_to_point(
    Blake2XsPlumo(
        encode_integer_8_le(counter) || 
        encode_integer_16_le(epoch number) || 
        encode_integer_8_le(round number) || 
        encode_integer_32_le(maximum_non_signers) || 
        BHHash(
            bytes_le_to_bits_le(truncate(epoch_hash, SEC_PARAM/8)) ||
            bytes_le_to_bits_le(truncate(parent_hash, SEC_PARAM/8)) ||
            pad(validator set, MAX_VALIDATORS, G2_GENERATOR)
        )
    )
)

where counter is the first number where encode_to_point succeeds starting from 0 and at most 255, round_number is the round that produced this signature during consensus, validator_set is the concatenation of the results of encode_public_key_le on each validator public key, epoch_hash is the hash of the current epoch block header, parent_hash is the hash of the previous epoch block header (17280 blocks in the past) and

G2_GENERATOR = ((233578398248691099356572568220835526895379068987715365179118596935057653620464273615301663571204657964920925606294, 140913150380207355837477652521042157274541796891053068589147167627541651775299824604154852141315666357241556069118), (63160294768292073209381361943935198908131692476676907196754037919244929611450776219210369229519898517858833747423, 149157405641012693445398062341192467754805999074082136895788947234480009303640899064710353187729182149407503257491))

Rationale

A single Plumo proof can support a fixed amount of validators and epochs. We believe that 150 validators and 130 epochs (~= 130 days) will satisfy Celo's needs for the next 2 years.

Moving the epoch number outside the BHHash allows for an efficient on-chain double-signing proof - this proof doesn't need the information inside the BHHash, it's just required to show there are two aggregate signatures by a super majority on the same epoch number and distinct opaque outputs of BHHash. We additionally include the round number from consensus to allow slashing double-signing on a single double signature on (block number, round number).

Adding the current and parent block header hashes prevents the "future subcommittee" attack since it adds an unpredictable value that the attacker can't predict in advance and therefore can't pre-sign, and establishes a chain of epoch headers. Specifically, the header contains the state hash, which in turn contains the result of the Celo randomness beacon. We take only SEC_PARAM/8 bytes, since, assuming the current hash bytes are sufficiently unpredictable, an attacker would need to pre-sign around 2^128 different messages in order to succeed. Since we use the parent hash to establish the chain, we have it to be the same length for similar reasons - if it was, e.g., 64 bits, then an attacker could pre-sign 2^64 messages with all possible values, and show that it's part of a chain of epoch block headers, even though this check wouldn't have passed if we used more bits. In other words, the current hash value is arbitrary and doesn't assist with security, only the parent hash does. That's because an attacker can arbitrarily choose the current hash when pre-signing the block, and can't choose the parent hash.

Test Cases

We provide the plumo-prover codebase, which consumes data from a Celo deployment and generates Plumo proofs. It will be run on private testnets before the test period and on Alfajores and Baklava during the test period.
We provide a proof-of-concept implementation of a slashing contract that utilizes CIP20 for Blake2Xs and the equivalent CIP of EIP2539 for BLS12-377 operations.

Implementation

The changes are implemented as PRs on celo-blockchain (#1231), celo-bls-snark-rs (#194) and celo-bls-go (#15).

Test plan

Ensure syncing with Alfajores, Baklava and Mainnet works up until the fork point.
Ensure the chain makes correct progress with a private chain, Alfajores and Baklava after the fork point. We will see that by all the validators successfully validating through epoch transitions.
Produce SNARK proofs using data from Alfajores and Baklava using plumo-prover with locally generated SNARK keys.
Craft slashable epoch data using a private chain and use the slashing contract proof-of-concept implementation.

Deployment Considerations

This CIP can in fact be deployed as a soft-fork, in the sense that only validators MUST upgrade. Full nodes and light-clients do not verify the correctness of this field. Nevertheless, we suggest to deploy it in a hard-fork to minimize the amount of upgrades validators have to perform.
The CIP puts a hard upper-bound on the number of validators to be 150.

Security Considerations

A general security issue in Plumo - a soundness issue in the circuit or the SNARK setup could cause light client to be on the wrong chain. This is why we've invested in internal and external audits.
This is a change to the operations occuring in the COMMIT phase of the consensus protocol. This means that a bug in the implementation can hurt the liveness of the chain. The risk is somewhat mitigated by the fact that the inputs to encoding methods are self-generated by each validator and so a malicious proposer can't inject arbitrary inputs. Additionally, the signing portions of the code are largely unchanged and have been audited and running in production since Celo launched its mainnet.

License

This work is licensed under the Apache License, Version 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cip-0022.md

cip-0022.md

Terminology

Simple Summary

Abstract

Motivation

Specification

Rationale

Test Cases

Implementation

Test plan

Deployment Considerations

Security Considerations

License

Files

cip-0022.md

Latest commit

History

cip-0022.md

File metadata and controls

Terminology

Simple Summary

Abstract

Motivation

Specification

Rationale

Test Cases

Implementation

Test plan

Deployment Considerations

Security Considerations

License