Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protoc-gen-go: determine plan for improving generated APIs #526

Open
dsnet opened this issue Feb 22, 2018 · 13 comments
Open

protoc-gen-go: determine plan for improving generated APIs #526

dsnet opened this issue Feb 22, 2018 · 13 comments
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Feb 22, 2018

Background

The API for generated proto messages (i.e., .pb.go sources) as created by protoc-gen-go is not perfect. It was not perfect when it was created, it is not perfect if we hypothetically fix it today, and it will not be perfect tomorrow. The reality is that we will (with high probability) eventually come up with a new API that is sufficiently better than the one today that it is worth some amount of migration effort. As such, we should come up with a way for protoc-gen-go to generate cleaner, idiomatic, and nicer APIs.

We should come up with a migration strategy that has the following properties:

  1. By default, protoc-gen-go initially generates the API that is considered the "best", by some measure of good according to the standard of the day (e.g., suppose the Go language adds native support for tagged unions, it would be really unfortunate to continue generating the wrapper types for proto oneofs or using pointers to primitive types).
  2. Once I start using protoc-gen-go to generate a specific message, I should have some guarantees regarding the stability of the generated API (i.e., that the generated API will not change or remove declarations). In other words, I should not need to change my .proto file at all to maintain compatibility.
  3. There should be some way to perform a multi-stage migration to the new API. For protos that are part of the public API of some widely used open-source package, then this doesn't matter. However, there are number of closed-source environments, where the generated proto and all the dependencies are within the control of a single organization, where a graceful migration is possible.

Goal 1 is about eagerly adopting the future, goal 2 is about maintaining the stability of the past, while goal 3 is about trying to migrate the past to the future. I understand that goal 3 is not possible for some use-cases, but that doesn't mean it is not worth supporting. I suspect that goal 2 and 3 are irreconcilable; if so, I propose that we favor goal 1 and firmly document that it is the user's responsibility to vendor or version control which exact version of protoc-gen-go they use.

How to accomplish this?

I don't know, but I have some rough ideas.

Some possible ways to achieve goals 1 and 2:

  • Every .proto file is required to specify the generated API version. This could be a single option go_api_version = "protoc-gen-go/v2" (whether to add the generator name and/or allow multiple generators is up for debate). Alternatively, instead of a monotonic number (like v2), it could be a flag of API features. If no such option is specified, you are always living on the edge (which unfortunately violates goal 2).
  • protoc-gen-go could emit an unexported constant or magic comment into the generated .pb.go file to know what API version or features it is using. When protoc-gen-go is invoked, it checks whether the output .pb.go file exists, if so, it determines what previous API version was used and generates according to it. In this approach, the API version is encoded in the output .pb.go file.
  • Your brilliant idea here.

Some possibles ways to achieve goal 3:

  • Add proto options to opt-in to generating duplicate API to support the previous API. For example, in protoc-gen-go: remove type name from generated enum #513 proposes to remove the type prefix from enumeration names. An option would allow the generation of both forms. However, note that supporting both APIs is not possible if there is a identifier conflict. For example, if we're trying to change a function signature. Also, note that using proto options alone can only satisfy goal 1 or goal 2, but not both.
    • If you satisfy goal 1, then you are probably breaking goal 2, since I am required to add the option to be proto file to generate the old API.
    • If you satisfy goal 2, then over time your .proto files are littered with option annotations to opt-in to the newer API. This does not scale as it is easy for users to forget specifying that they want the new API semantics on a newly added proto field, which violates goal 1.
  • Your brilliant idea here.

Other considerations

The issue of migration is tangentially related to the desire to have added support for many Go-specific proto options. All of these need to be considered together. I believe it would be a mistake to start adding options without a cohesive idea of where we're going.

@aajtodd
Copy link

aajtodd commented Feb 22, 2018

Couple thoughts:

In pursuit of these goals, and in light of tools like dep and the recent vgo announcement, protobuf should probably start tagging releases.

Presumably this only affects Go generated code (e.g. clients using protobufs from another language targeting a Go server would be able to interact regardless of the go protobuf options set). If options aren't going to be added in a backwards compatible way my preference would be to break hard and tag as a new major release. Users have a choice of which version they target and what options they set. I don't see a huge benefit in remaining backwards compatible as suggested by generating both forms.

@awalterschulze
Copy link

awalterschulze commented Feb 22, 2018

My brilliant idea

Use extension options as feature flags, like gogoprotobuf does.
This accomplishes goal 1, goal 2 and goal 3.
Technically goal 3 has not a big use case for gogoprotobuf, but it could be accomplished using the same technique.

So to address your concern that over time the proto file could becomes littered with extension options.
This is where gogoprotobuf's vanity binaries come in to play. protoc-gen-yourvanityplatehere

It allows users to programmatically add set extension options on the FileDescriptorSet before passing it to the generator. This same technique could be used to create protoc-gen-gov2, etc

https://github.com/gogo/protobuf/blob/master/protoc-gen-gogofaster/main.go

That way the user's proto file does not have to be polluted or even use any extension options, but they can still get the newest generated code and they are also allowed migration options using extension options and customizability to generate the structs they want to use.

This technique is also very useful for generating time.Time instead of ptypes.Timestamp, for some fields, or all fields, depending on whether you use extension options or a vanity binary respectively.

@RangelReale
Copy link

In the light of this problem, I created a package that wraps the generated Go code into easier to use structs, and also can convert well-known types to standard Go types, like time.Time and time.Duration. It converts between them, and use the Go generated files as if it was an implementation detail.

Using a plugin architecture more converters can be created.

It does a lot of boilerplate conversion, but you would need to do it manually anyway. The generated import and export code are a little convoluted, to support the plugin architecture.

It is here if you want to take a look:
https://github.com/RangelReale/fproto-gowrap

It is very new, so I am not sure it supports all protobuf types yet.

@myitcv
Copy link
Member

myitcv commented Mar 30, 2018

@dsnet @neild - is this the correct issue under which to discuss vgo-related points? See Russ' most recent comment

@dsnet
Copy link
Member Author

dsnet commented Mar 30, 2018

Depending on what we plan to do here, it may or may not be impacted heavily by vgo.

@neild
Copy link
Contributor

neild commented Mar 30, 2018

@myitcv Hard to say without know without knowing what the point is. What vgo-related issues do you see?

FWIW, I've been happily using vgo with the protobuf repo for a little while now. It needs a go.mod in the root directory, but otherwise seems to work quite well.

@myitcv
Copy link
Member

myitcv commented Apr 1, 2018

@dsnet @neild

From golang/go#24301 (comment):

Will vgo address non-Go dependencies, like C or protocol buffers? Generated code?

...

That said, we certainly do understand that using protocol buffers with Go is too difficult, and we'd like to see that addressed separately.

From golang/go#24301 (comment):

I know that protocol buffers in particular have registration problems, and those problems are exacerbated by the very ad-hoc way that protocol .pb.go files are passed around and copied into projects. That's again something that's mostly independent of vgo. We should fix it, but probably by making changes to the way Go protocol buffers are used, not vgo.

I'm interested in the answers to these questions from a protobuf perspective. Per @rsc's comments, there's nothing per se about vgo to "fix" here in protocol buffers world, rather @rsc sees solutions to those problems being independent of vgo.

So I was just looking to see if there's already a place these "issues" are being tracked in such a way that discussions about them that arise out of vgo discussions can be appropriately linked.

@neild
Copy link
Contributor

neild commented Apr 2, 2018

@myitcv

My take is that there are two broad ways in which protocol buffers are used to build programs.

One is with full support of the build system. When building a binary which uses a proto library, the build system will run protoc to produce Go source code and then compile that source code with the Go compiler. If the build system has good support for dependencies, changing the proto source file will trigger a recompilation of the binary which depends on it. This is the case for people using Bazel or other such tools. I think we have pretty good support for this case.

The other is when using the go tool (or vgo), which has no support for invoking the proto compiler. In this case, people generally manually invoke protoc to produce Go source code, check that source code into their revision control system, and proceed from there as if the generated package were any other Go package. This case is necessarily an inferior experience, since the build system won't recompile the proto files for you (when they change, or when protoc changes), but I think it also works reasonably well. Since the go tool doesn't deal with anything other than Go source code, by design, our ability to improve this case is fundamentally limited.

If there are any ways in which the Go proto compiler can better support either of these cases, then it's worth filing an issue here. I've made a few changes recently to address a few problems, but there's probably more that can be done.

@myitcv
Copy link
Member

myitcv commented Apr 2, 2018

Thanks @neild, certainly helps to bring these two threads together.

I think (and this is only from reading the various threads because I've not run into it before myself) one of the main issues being discussed in golang/go#24301 (comment) is the "problem" of "singleton" registration, or put another way the problem that can arise from having multiple major versions of a package that contains protobuf generated code in your dependencies. Quoting from golang/go#24301 (comment):

This would also affect things like database drivers and image formats that register themselves with another package during init, since multiple major versions of the same package can end up doing this. It's unclear to me what all the repercussions of that would be.

Yes, the problem of code that assumes "there will be just one of these in a program" is real, and it's something we are all going to have to work through to establish new (better) best practices and conventions. I don't think this problem is being introduced by vgo, and vgo arguably makes the situation better than before.

It's my understanding the Go-generated code "suffers" from this issue. As @rsc says, this isn't per se a problem with vgo/it's implementation, but it becomes an issue if people previously relying on dep et al to resolve down to a single version of a package now end up with multiple versions of the same package in their builds.

@neild
Copy link
Contributor

neild commented Apr 2, 2018

Multiple registration is definitely a problem. I agree with @rsc that vgo should make the situation better than it was before. With vendoring, it's easy for a single generated proto package to be included multiple times in a program. With vgo, that will only happen if the package has multiple major versions. (And, I would argue, the resolution in that case is to not have multiple major versions of the same message--the proto namespace should change when the package major version does.)

@stevvooe
Copy link

stevvooe commented Apr 3, 2018

The main issue with protobuf and Go right now is creating a canonical import path for consumption by the protobuf compiler. We largely map the import path to GOPATH for protobuf (using protobuild), but vgo is going to make this very hard to continue doing.

The solution here will be something that can do the following:

  1. Distribute protobuf files in a way that let's their import paths be consistent. This could be taken on by Go (or vgo) but it could also be taken on by the protobuf project.
  2. Allow mapping of protobuf imports to specific Go packages.

There may also be some space here to embrace the fact that there are different ways of generating the same protobufs. We face this problem when using gogo, but we can learn from that if you want to do full versioning of the generated code.

@dsnet
Copy link
Member Author

dsnet commented Apr 3, 2018

Let's keep this issue about migrating the generated API. I'll file a separate issue what to do with Go import paths.

@spenczar
Copy link
Contributor

spenczar commented Apr 5, 2018

We're talking about wanting to make backwards-incompatible changes to generated code, like removing exported symbols or changing exported function signatures, right? Then, my proposal is that backwards-incompatible changes must entail a new package main for the code generator.

If this were a normal package, we'd say "backwards-incompatible changes require a major version bump;" and the import compatibility rule would say that you have to make a new import path for your incompatible change. If you wanted to remove a function from github.com/golang/protobuf/proto, for example, you'd have to make github.com/golang/protobuf/protov2, or something equivalent.

The same principal seems to apply for code generators. If you want the generator's output to change in an incompatible way, you need to release a new major version, so that the old incompatible way is still available. You would need to release protoc-gen-go_v2.

This clearly accomplishes goals 2 and 3. I am not so sure whether it accomplishes 1 - it sort of depends on what you mean. I think it clarifies the situation quite a bit, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants