This defines a set of encoding rules for Protocol Buffers 3 (proto3) for serializing messages deterministically such that the serialized form is suitable for signing and encoding in cryptographic attestations (ex. Merkle trees). Similar to ASN.1 and Cap'n Proto, a set of "canonical encoding rules" (CER) is used to define a canonical encoding where the basic proto3 specification does not do so. In this sense, the default protocol buffers specification provides a set of "basic encoding rules" which are not deterministic, and we extend that specification to support deterministic encoding for cryptographic use cases.
This is the most intuitive order in which to serialize fields.
Requiring default values to be serialized would prevent clients from an older version of a protocol from sending messages to transaction processors which use a later version. Also, in proto3 there is no semantic distinction between empty and default fields and thus serializing a default value is not intended to communicate any information. Thus the most canonical behavior is to always omit fields with empty or default value from serialization.
While maps could have a canonical encoding, they are too problematic for cryptographically sensitive use cases and thus excluded for now.
A recipient cannot determine if a message with unknown fields is canonical or not. Therefore all transaction processors which receive messages with unknown fields should treat them as not canonical. In spite of this limitation, clients from an early version of a protocol can send messages to transaction processors which understand a later version of the protocol without causing a problem. Transaction processors would also reject messages intended for a later version of the protocol which they do not understand which is likely the safest and most correct behavior in most cases.
In addition to the rules below, signable canonical protobuf JSON must follow https://gibson042.github.io/canonicaljson-spec/.
Remove all fields whose value is 0
, false
, ""
, null
, []
, or {}
.
The proto3 JSON specification states that these types can use 0, 3, 6 or 9 digits in JSON output. For a simple deterministic encoding, we specify the most precise of these 9 digits.
This creates unnecessary discrepancies between proto field names and their JSON representation and could lead to weird conflicts (if someone was foolish enough to define both myField and my_field).
Please submit a PR if you have implementation details to add to this list.
Implementations should specify one of the following levels of alignment:
- Level 1: there are clear rules to follow in order to make this implementation follow CER
- Level 2: this implementation has explicity code generation flags or static linting tools for safely supporting CER
- Level 3: this implementation provides a zero-allocation "is_canonical" or "unmarshal_canonical" method for checking if a message is canonical
Note that level 1 and 2 implementations can still verify that a message is canonical by re-encoding it canonically and comparing.
gogo protobuf - Level 1
gogo proto mostly follows canonical encoding rules with the caveats listed below.
This causes default/empty fields to be emitted in binary and json encodings.