Recommend proto3 #465

MarcoPolo · 2022-10-03T20:31:44Z

There seems to have been a misunderstanding in the past around proto2 vs proto3. My attempt here is to clear up the confusion, recommend proto3 in general, and explain why proto3 should be preferred.

Our main confusion is about field presence. That is, if a field is omitted from the serialized wire format does the user of the decoded message know the difference between if the field was unset or set as the default value. This document has a lot of good information and is worth the read: https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md

Origins of the confusion

Proto2 would always serialize an explicitly set field, even if it was set to the default. This meant that you could know on the decoding side whether the field was set or not. This is called Explicit Presence. For example, in the Rust protobuf compiler, it would wrap these in Option<T>: https://github.com/tokio-rs/prost#field-modifiers.

The confusing thing is that the language guide for proto2 states:

A well-formed message may or may not contain an optional element. When a message is parsed, if it does not contain an optional element, accessing the corresponding field in the parsed object returns the default value for that field.

The subtlety here is that this doesn't say anything about "hasField" accessors. Which may be provided by the implementation to check if the field was set or not. This is essentially with prost is doing with Option<T> types.

Another confusing thing is that this language guide doesn't mention "presence" a single time. Which is what we're talking about here.

In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.

Field Presence Proto2 vs Proto3

To clarify field presence in proto2 vs proto3:

From https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-proto2-apis

Proto2

Field type	Explicit Presence
Singular numeric (integer or floating point)	✔️
Singular enum	✔️
Singular string or bytes	✔️
Singular message	✔️
Repeated
Oneofs	✔️
Maps

Proto3

Field type	`optional`	Explicit Presence
Singular numeric (integer or floating point)	No
Singular enum	No
Singular string or bytes	No
Singular numeric (integer or floating point)	Yes	✔️
Singular enum	Yes	✔️
Singular string or bytes	Yes	✔️
Singular message	Yes	✔️
Singular message	No	✔️
Repeated	N/A
Oneofs	N/A	✔️
Maps	N/A

Advantages in Proto3 compared to Proto2

No required modifier
- This is generally considered an anti-pattern since all future versions of this message will need to contain this field. Generally users should prefer custom validation.
Opt-in explicit presence
- It's good to be able to get the space advantages of no-presence while still being able to opt-in to explicit presence. If we pass in an empty byte array most of the time this is semantically the same as passing no byte array, so it's nice to avoid paying the byte-cost for this.
- But if we do want explicit presence we can opt in to it. This is useful in case we do semantically care about knowing if there was nothing set.
Simple feature set: "The reason for removing these features is to make API designs simpler, more stable, and more performant. " from https://cloud.google.com/apis/design/proto3
Better ecosystem support. As libraries develop, it's likely they will support the latest protobuf spec rather than continue supporting proto2. This is already the case with protons, the compiler that JS-IP uses (see this bug).
User-defined default value for fields is no longer available. This was somewhat tricky to get right.

Next steps

Come to mutual understanding around proto2 vs proto3.
Come to consensus around recommending proto3.
Make the change to README.md

The text was updated successfully, but these errors were encountered:

marten-seemann · 2022-10-04T05:36:10Z

In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.

Isn't that potentially a problem? If I explicitly set a field, and the value just happens to coincide with the default value, I'd want it to be serialized, no?

EDIT: I might be misunderstanding, does this only apply to non-optional fields? That would make sense then, since if a field is non-optional (= required), there's no way it could have been left empty.

achingbrain · 2022-10-04T06:50:06Z

does this only apply to non-optional fields?

Yes - more correctly it applies to singular fields, the default type of field - see the Specifying Field Rules of the proto3 spec.

On deserialization to their object form both singular and optional fields are set to their default values if no value was present on the wire.

On serialization optional fields write any value that was set even if it was the default (and no value if one was not set), singular fields only write out a value if the value set was not the default.

Consequently optional fields let you know if the field was set, singular fields do not.

MarcoPolo · 2022-10-04T17:21:35Z

On serialization optional fields write any value that was set even if it was the default (and no value if one was not set), singular fields only write out a value if the value set was not the default.

yup: here are some example tests: https://github.com/MarcoPolo/proto2and3-playground/blob/main/src/main.rs#L122. proto3 optional bytes behave exactly like proto2 optional bytes.

MarcoPolo · 2022-10-04T17:29:53Z

In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.

Isn't that potentially a problem? If I explicitly set a field, and the value just happens to coincide with the default value, I'd want it to be serialized, no?

EDIT: I might be misunderstanding, does this only apply to non-optional fields? That would make sense then, since if a field is non-optional (= required), there's no way it could have been left empty.

A couple of things that may help clarify (I'm talking about proto3 here):

The default value is type defined, not user defined. For example, an empty array of bytes is the same as not setting the bytes. I believe in most cases this is semantically the same. But in the cases where you do need to differentiate this you can use the optional modifer.
Fields without the optional modifer are not required. This is a little confusing, so I'll elaborate. Proto2 had a notion of "required" fields. This meant that if this field wasn't present, decoding would fail. This is not the case with Proto3. Proto3 doesn't have a notion of "required" fields that will cause decoding to fail. This is on purpose and a good thing. If a non optional field is missing in the message, proto3 assumes it's the default value for that type. So for a byte array the default value would be an empty byte array or a string would be an empty string. For a nested message we do "explicit presence" regardless if the field has an optional modifier (see the table above).

if a field is non-optional (= required), there's no way it could have been left empty.

With the above, hopefully it's clear there is no "required" notion in proto3. A field that is not marked optional could have been left empty. This is usually fine, but if your program depends on knowing if the field was set vs unset it should mark the field as optional in the protobuf.

vyzo · 2022-10-04T18:19:44Z

Not failing decode on missing required fields is not a good thing, now you need extra logic in the client for this and a giant footgun.

MarcoPolo · 2022-10-04T19:43:11Z

Not failing decode on missing required fields is not a good thing, now you need extra logic in the client for this and a giant footgun.

You probably already have some logic in the client side checking if the value is even appropriate. Having required fields makes backwards/forwards compatibility hard because a required field is forever. This isn't just my opinion, there's a lot written about this:

Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead.

A second issue with required fields appears when someone adds a value to an enum. In this case, the unrecognized enum value is treated as if it were missing, which also causes the required value check to fail.

from: https://developers.google.com/protocol-buffers/docs/proto#specifying-rules

And more:

https://stackoverflow.com/questions/31801257/why-required-and-optional-is-removed-in-protocol-buffers-3
why messge type remove 'required,optional'? protocolbuffers/protobuf#2497 (comment) (note this is from 2016, so it's a bit out of date since proto3 now has optional).

mxinden · 2022-10-08T10:50:29Z

Thanks @MarcoPolo for the research and the elaborate description.

One thing to note is that proto3 only supports optional since protoc v3.15.0. Users may use an older protoc version. E.g. Debian bullseye ships with v3.12.4. I don't think we should give this much weight. In other words, I don't think this is an argument against proto3. I am still raising it here so we can help users that run into it. (The error message with protoc <v3.15.0 does not make this obvious.)

I am fine with libp2p moving to proto3.

I am sorry for being the source of the confusion on presence in proto2 and proto3.

marten-seemann · 2022-10-08T11:08:22Z

I am sorry for being the source of the confusion on presence in proto2 and proto3.

No need to be sorry. This is not easy to reason about.

Debian bullseye ships with v3.12.4.

I don't think we should pay a lot of attention to what Debian does. I don't see any good justification for their focus on "stability" (aka outdated software). For example, they ship with Go 1.15, which was released in Aug 2020 and has been unmaintained for more than a year now.

I'm wondering if we should move all of our existing protobufs to proto3 as well. It would be nice to be consistent across our entire stack, and we could get rid of proto2 compiler dependencies. We'd have to check that this can be done in a backwards-compatible way in all our protocols though.

achingbrain · 2022-10-14T13:04:56Z

This is not easy to reason about.

It certainly is not. I discovered the other day that the official protobuf.js module doesn't handle default values properly when deserializing "singular" fields so even Google don't get it right sometimes and it's their spec.

I'm wondering if we should move all of our existing protobufs to proto3 as well

I've been doing this with the js stack as we took the decision to only support proto3 in protons and it's mostly been ok.

One oddity is when a proto2 field has been marked as required, and you are sending a message to a peer that will use a proto2 decoder it needs a value on the wire. The only way to ensure this happens in proto3 is to mark the field optional as if it's singular the value will be omitted if it's the default value.

🤪

thomaseizinger · 2022-10-18T23:59:57Z

Is it correct to think about it in the following way:

Every type in proto3 has a default value. Unset fields in message will show up as their default value upon deserialization.
optional fields have "None" as a default value (like Rust's Option or Haskell's Maybe).

achingbrain · 2022-10-19T10:51:36Z

Every type in proto3 has a default value. Unset fields in message will show up as their default value upon deserialization.

Yes, but with one gotcha that the default value for Message fields is for them to be unset, the exact value of which is language-dependent.

optional fields have "None" as a default value (like Rust's Option or Haskell's Maybe).

According to the spec all fields should be set to their default value upon deserialization (even optional fields).

If the field is marked optional you should be able to check if it was explicitly set - how you do that varies by language and even by protobuf implementation within the language.

If the field is singular (the default) you cannot check if it was explicitly set.

See Specifying Field Rules.

Allows future versions to deprecate and remove fields in a two step process. For more details see #465 (comment)

MarcoPolo mentioned this issue Oct 4, 2022

feat!: add Noise Extensions and update deps ChainSafe/js-libp2p-noise#215

Merged

mxinden mentioned this issue Oct 17, 2022

webrtc/: Add libp2p WebRTC browser-to-server spec #412

Merged

19 tasks

MarcoPolo mentioned this issue Dec 19, 2022

ci: install protoc from repositories where possible libp2p/rust-libp2p#3258

Merged

4 tasks

mxinden mentioned this issue Dec 19, 2022

transports/noise: Change protobuf files to proto2 libp2p/rust-libp2p#3007

Closed

4 tasks

achingbrain mentioned this issue Jan 17, 2023

feat(webrtc): add WebRTC (prev. browser-to-browser) spec #497

Merged

2 tasks

mxinden added a commit that referenced this issue Jan 18, 2023

Mark proto fields as optional

b5d82c4

Allows future versions to deprecate and remove fields in a two step process. For more details see #465 (comment)

MarcoPolo mentioned this issue Jan 21, 2023

Recommend proto3 #506

Merged

MarcoPolo closed this as completed in #506 Jan 23, 2023

MarcoPolo mentioned this issue Jan 27, 2023

circuitv2: upgrade definition to proto3 #509

Closed

3 tasks

mxinden mentioned this issue Apr 26, 2023

add autonat v2 spec #538

Merged

jxs mentioned this issue Mar 13, 2024

transports/noise: Use proto2 libp2p/rust-libp2p#2924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommend proto3 #465

Recommend proto3 #465

MarcoPolo commented Oct 3, 2022

marten-seemann commented Oct 4, 2022 •

edited

Loading

achingbrain commented Oct 4, 2022 •

edited

Loading

MarcoPolo commented Oct 4, 2022

MarcoPolo commented Oct 4, 2022

vyzo commented Oct 4, 2022 •

edited

Loading

MarcoPolo commented Oct 4, 2022 •

edited

Loading

mxinden commented Oct 8, 2022

marten-seemann commented Oct 8, 2022

achingbrain commented Oct 14, 2022

thomaseizinger commented Oct 18, 2022

achingbrain commented Oct 19, 2022

Recommend proto3 #465

Recommend proto3 #465

Comments

MarcoPolo commented Oct 3, 2022

Origins of the confusion

Field Presence Proto2 vs Proto3

Proto2

Proto3

Advantages in Proto3 compared to Proto2

Next steps

marten-seemann commented Oct 4, 2022 • edited Loading

achingbrain commented Oct 4, 2022 • edited Loading

MarcoPolo commented Oct 4, 2022

MarcoPolo commented Oct 4, 2022

vyzo commented Oct 4, 2022 • edited Loading

MarcoPolo commented Oct 4, 2022 • edited Loading

mxinden commented Oct 8, 2022

marten-seemann commented Oct 8, 2022

achingbrain commented Oct 14, 2022

thomaseizinger commented Oct 18, 2022

achingbrain commented Oct 19, 2022

marten-seemann commented Oct 4, 2022 •

edited

Loading

achingbrain commented Oct 4, 2022 •

edited

Loading

vyzo commented Oct 4, 2022 •

edited

Loading

MarcoPolo commented Oct 4, 2022 •

edited

Loading