-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose ADR 027: Deterministic Protobuf Serialization #6979
Conversation
ping @ethanfrey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @webmaster128 !
This is a really good write up.
I do want to include an explanation for why omitting default values is logical. It is not at all arbitrary and I suggested some text to explain.
Can we please find some other name besides Regencode
? It's interesting but also a little awkward 🤪 . Maybe canonical proto3 or proto3 CER?
Omitting empty fields is a valid option because the parser must assign the | ||
default value to fields missing in the serialization<sup>2</sup>. Requiring to | ||
serialize them would be equally valid. No good reason is known to the author for | ||
preferring one of those options over the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omitting empty fields is a valid option because the parser must assign the | |
default value to fields missing in the serialization<sup>2</sup>. Requiring to | |
serialize them would be equally valid. No good reason is known to the author for | |
preferring one of those options over the other. | |
Omitting empty fields is the logical choice for a canonical protobuf serialization because it allows for some amount of forward compatibility. That means that users of newer versions of a protobuf schema will produce the same serialization as users of older versions if they do not set newer fields. Effectively omitting newer fields causes the newer clients to fall back to an older version of the protocol. This would not be possible if empty fields were serialized by default. |
Thank you @aaronc!
Thanks for the suggested explanation. I agree that you have a point there, which was not explained well any anything I read before. I don't fully agree to consider this path more "logical" or natural than the other one. The feature it brings was never a requirement. But I agree this is a plus. Will merge our texts accordingly.
haha, yes it is. See it as a way to express attribution. But let's find something better. I'll think about it … |
I would like to have something a little bit context-specific. We cannot encode all proto3 documents (and don't need to). Also the requirements are very specific to our space. |
Let's just forget about a name and call it ADR26, just like we have bip39 and slip0010. |
@aaronc the last two commits address the reasoning behind omitting empty values and the name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
Thanks @webmaster128 🎉 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually one more small issue. There's an existing PR for ADR 026 (#6922). Let's make this 027.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Explicit with test vector and clear reasoning.
It codifies the same behavior we have in Go currently, so no sdk code changed needed.
One question, it only states that the SignDoc must be encoded via ADR026, but it is left unclear of the rest of the tx. I believe this MAY be ADR026, but also MAY be any other valid proto3 encoding. Is that correct @aaronc ?
docs/architecture/adr-026-deterministic-protobuf-serialization.md
Outdated
Show resolved
Hide resolved
docs/architecture/adr-026-deterministic-protobuf-serialization.md
Outdated
Show resolved
Hide resolved
docs/architecture/adr-026-deterministic-protobuf-serialization.md
Outdated
Show resolved
Hide resolved
docs/architecture/adr-026-deterministic-protobuf-serialization.md
Outdated
Show resolved
Hide resolved
Exactly. We have made efforts to make the surface area where determinism needed as small as possible. Everywhere else, the only requirement is a valid proto3 encoding. That has the side effect of reduced malleability. I'll also note that unknown fields pretty much anywhere will cause the tx to get rejected. Unknown "non-critical" fields are allowed on |
Thanks for pointing that out @aaronc. ADR 027 it is. Thank you for the review, @ethanfrey. Typo fixed.
Right, all documents lower than SignDoc are embedded as raw bytes. Any valid protobuf serialization is fine for those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I'm reading this https://gist.github.com/kchristidis/39c8b310fd9da43d515c4394c3cd9510, and I found 2 additions to would make this ADR more exhaustive.
- For packed repeated, we should only allow one "pack" (i.e. 1 length prefix, followed by all the elements in the array).
- it goes without saying, but it will go even better if we say it: we don't allow encoding the same field twice. (iiuc, proto3 does allow that).
Without these 2 rules, I can produce 2 different serializations of the same message that both adhere to all rules described in this ADR.
(edit: there might be other edge cases which produce undeterministic serializations, but I guess we can amend this ADR as we discover them)
docs/architecture/adr-027-deterministic-protobuf-serialization.md
Outdated
Show resolved
Hide resolved
Is this still pending? Would be good to wrap this up and get it merged. Can any other maintainers take a quick look at this @alexanderbez @alessio @jgimeno ? |
Yes it is. I did not yet get a chance to look into this, especially sice I don't know what a packed repeated is (yet). I'll try to get it done this week. Thanks @amaurymartiny by the way for the valuable input. |
Packed repeated is a compressed form of handling repeated fields when the data is scalar (int, float) not structs
I think we can just refer to the default behavior in the spec if we need to mention it |
I thought that was mentioned in the spec, but if not, definitely should be added |
@amaurymartiny could you suggest some changes for @webmaster128 to address the packed repeated issue? Once we have that taken care of, I think we can move this forward. |
Co-authored-by: Aaron Craelius <[email protected]>
Co-authored-by: Amaury Martiny <[email protected]>
87b0e40
to
7299e7f
Compare
Okay, updated. There are 3 new commits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, this looks good.
Description
During the development of ARD-020 it became clear that we need a deterministic encoding of a
SignDoc
. The canonicalization from https://github.com/regen-network/canonical-proto3 was explicitely dropped in the process for various rasons. This is an attempt to get the necessary specification from those canonicaliztation rules while removing all decoding or JSON aspects. Motivation, implementation and a test vector were added.I want to use this as a discussion base, which is probably more productive than long Github threads.
Also feel free to propose a better name it you don't like it.
If this spec is accepted, the change from #6949 is not necessary anymore. However, it would still be possible to do it and increase the chance that developers get something working quicker. So this is not an attempt to close PR #6949. As Aaron mentioned, it would be nice to get more people's view on the 0/1 question.
Kudos to @aaronc who brough big chunks of this document to the table.
Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.
docs/
) or specification (x/<module>/spec/
)godoc
comments.Unreleased
section inCHANGELOG.md
Files changed
in the Github PR explorerCodecov Report
in the comment section below once CI passes