Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce mechanism for specifying protobuf field numbers #367

Merged
merged 4 commits into from
Oct 20, 2023

Conversation

Ladicek
Copy link
Contributor

@Ladicek Ladicek commented Sep 12, 2023

Previously, field numbers were always assigned automatically during protobuf
schema generation, based on declaration order of properties in a data class.
It is very easy to modify the data class in a way that changes field numbers
unintentionally, which introduces a breaking change to the protobuf schema.

This commit introduces the annotation @ProtobufField as a mechanism for
explicitly providing field numbers, as well as a FieldNumberStrategy enum
(used in @ProtobufGen.fieldNumberStrategy) to select how field numbers
should be assigned. There's a manual strategy and two automatic strategies.

The fields in the protobuf schema are generated in the field number order,
not in the property declaration order. The serialization/deserialization
code in generated converters also uses field number order. This is to make
sure that a Vert.x-generated and protoc-generated serializers produce
outputs that are not just mutually compatible, but also binary identical.

Resolves #359

@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 12, 2023

CC @vietj @lwlee2608

@lwlee2608
Copy link
Contributor

Thank you for the PR!

Being able to specify the fieldNumber is definitely important. However, in my opinion, the new annotation @ProtobufField should be optional. This would help prevent 'annotation hell,' especially when combining annotations from other libraries.

Currently, the fieldNumber values are automatically assigned based on alphabetical order. Ideally, we would want them to be assigned based on the position within the POJO.

This way, users can choose to 'assign' the fieldNumber by properly arranging the POJO field positions without specifying the annotation. Users can still specify a specific fieldNumber by using the @ProtobufField annotation.

@ProtobufGen
public class Pojo {
  private String field1; // auto-assign field number 1
  private String field2; // auto-assign field number 2
  @ProtobufField(fieldNumber = 10)
  private String field3; // assign field number 10
}

Lastly, I think ProtobufField.value() should be changed to ProtobufField.fieldNumber() to make it more explicit, as we will probably add more properties to this annotation in the future.

  @ProtobufField(fieldNumber = 10, futureProperty="fooBar")

@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 13, 2023

I did ponder about that, and I don't see a way how to automatically assign field numbers while preventing or even minimizing unintentional breaking changes. As far as I understand Protobuf, the assignment of field numbers is absolutely crucial for schema evolution, which is guaranteed to happen for practically all message types. I believe it is best to just make field numbers mandatory, just like they are in the Protobuf language itself.

(I also just realized we could add support for reserved field numbers / names, as in https://protobuf.dev/programming-guides/proto3/#fieldreserved. Maybe as @ProtobufGen(reservedFieldNumbers = { 1, 2, 3 }, reservedFieldNames = { "foo", "bar" }), or using a special annotation on the class. We could generate that into the .proto file and also validate that in the code generator itself. But that's unrelated to the present PR.)

@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 13, 2023

Actually thinking about it again, if we could assign field numbers based on declaration order, as you suggest, that would be fairly safe -- assuming that people never remove fields (even if they are obsolete and unused), and only add fields to the end (even if they logically belong elsewhere).

I'm not sure if it's safe to make those assumptions, and I'm not sure if we can get the properties in declaration order (I'm not even sure what would constitute the declaration order, as properties are based on fields, getters and setters, combined). If there's a way to make that happen, I probably could be persuaded otherwise, but as long as we get properties in alphabetical order, I believe we should just make explicit field numbers mandatory.

@vietj
Copy link
Member

vietj commented Sep 15, 2023

@Ladicek ordering of element declarations in a processed model is guaranteed.

https://docs.oracle.com/javase/8/docs/api/javax/lang/model/element/TypeElement.html#getEnclosedElements--

Note that as a particular instance of the general accuracy requirements and the ordering behavior required of this interface, the list of enclosed elements will be returned in the natural order for the originating source of information about the type. For example, if the information about the type is originating from a source file, the elements will be returned in source code order.

it seems currently the vertx codegen processor does not respect the order and I think we should address this and change this behaviour so user can reason about it.

of course there should be a way as you suggest to override this and force a field number assignment

@vietj
Copy link
Member

vietj commented Sep 15, 2023

cc #370

@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 15, 2023

OK, then we can do that. I'll make the annotation optional and add a warning to the javadoc.

My personal opinion is that generating field numbers automatically is an exquisitely terrible idea and noone should ever do it, but if that's what you prefer, I'm fine with that.

@lwlee2608
Copy link
Contributor

@Ladicek What about adding a property to @ProtobufGen(autoFieldNumber = true)? The ProtobufField(fieldNumber=1) would be mandatory when this property is set to false

@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 17, 2023

That would certainly be possible, yes. I'll see what I can do.

@Ladicek Ladicek force-pushed the protobuf-schema-evolution branch from cca6885 to d0770a6 Compare September 18, 2023 10:18
@Ladicek Ladicek changed the title Introduce mandatory mechanism for specifying protobuf field numbers Introduce mechanism for specifying protobuf field numbers Sep 18, 2023
@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 18, 2023

Updated per the conversation above. @ProtobufGen now has a member autoFieldNumbers that defaults to true. In such case, field numbers are assigned automatically, unless there's a @ProtobufField annotation, whose value takes precedence. The autoFieldNumbers member may be set to false, in which case all properties must have the @ProtobufField annotation. This is safer, but more verbose.

I wanted to ask for your opinion about the default value of autoFieldNumbers. Maybe it shouldn't have a default value and users should always select whether they want field numbers automatically assigned or not. At least they would be making that choice, consciously. WDYT?

@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 18, 2023

Also, one important notice:

This way, users can choose to 'assign' the fieldNumber by properly arranging the POJO field positions

I noticed in @vietj's commit that property declaration order is not established based on the declaration order of fields, but on the declaration order of accessor methods. They should generally be consistent, but that is not enforced anywhere and may be surprising.

Copy link
Member

@vietj vietj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion

  • consider using an enum instead of boolean for the behaviour for numerating fields
  • introduce reserved field members of @ProtobufGen to declare the reserved portion of field numbers

@Ladicek Ladicek force-pushed the protobuf-schema-evolution branch from d0770a6 to c495a8e Compare September 29, 2023 15:49
@Ladicek
Copy link
Contributor Author

Ladicek commented Sep 29, 2023

I just force-pushed a change that uses the enum for selecting field number assignment strategy. I didn't implement reserved fields yet, will follow next week.

@Ladicek Ladicek force-pushed the protobuf-schema-evolution branch from c495a8e to c7dd68a Compare October 2, 2023 15:19
@Ladicek
Copy link
Contributor Author

Ladicek commented Oct 2, 2023

Reserved field numbers / names done.

I noticed that we currently don't support specifying enum constants, those are always generated automatically. I tried to fix that, but found that it would be more complicated than I thought (EnumTypeInfo doesn't provide full access to enum values, just their names), so I figured we can leave that for later.

Other than that, I also added some tests for the field number assignment algorithms and I believe this is now ready.

* be either left in place or added to the <em>reserved</em> set), and <strong>always add
* properties at the end</strong>, even if they logically belong elsewhere.
*/
PACKED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this to "COMPACT" instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMPACT sounds good to me, will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

Copy link
Member

@vietj vietj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change asked.

@vietj vietj added this to the 5.0.0 milestone Oct 6, 2023
Previously, field numbers were always assigned automatically during protobuf
schema generation, based on declaration order of properties in a data class.
It is very easy to modify the data class in a way that changes field numbers
unintentionally, which introduces a breaking change to the protobuf schema.

This commit introduces the annotation `@ProtobufField` as a mechanism for
explicitly providing field numbers, as well as a `FieldNumberStrategy` enum
(used in `@ProtobufGen.fieldNumberStrategy`) to select how field numbers
should be assigned. There's a manual strategy and two automatic strategies.

Further, `@ProtobufGen.reservedFieldNumbers` and `reservedFieldNames` allow
specifying reserved field numbers / names, similarly to the protobuf language
itself. Reserved field numbers and names are verified at compile time.

The fields in the protobuf schema are generated in the field number order,
not in the property declaration order. The serialization/deserialization
code in generated converters also uses field number order. This is to make
sure that a Vert.x-generated and `protoc`-generated serializers produce
outputs that are not just mutually _compatible_, but also binary _identical_.
@Ladicek Ladicek force-pushed the protobuf-schema-evolution branch from c7dd68a to df2886d Compare October 6, 2023 14:21
@vietj vietj merged commit d44a385 into eclipse-vertx:master Oct 20, 2023
3 checks passed
@vietj
Copy link
Member

vietj commented Oct 20, 2023

thanks for the contribution!

@Ladicek Ladicek deleted the protobuf-schema-evolution branch October 20, 2023 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protobuf generator schema evolution
3 participants