Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose vtprotobuf generation #824

Closed
howardjohn opened this issue Nov 22, 2023 · 7 comments
Closed

Expose vtprotobuf generation #824

howardjohn opened this issue Nov 22, 2023 · 7 comments
Labels

Comments

@howardjohn
Copy link
Contributor

vtprotobuf is a protoc plugin that enables faster proto operations. See blog for introduction.

The tl;dr is that it is gogo protobuf, but built on top of protobuf v2, instead of replacing it. vtprotobuf works by creating strictly new files/methods, and enhancing existing types (similar to protoc-gen-validate). Users who wish to utilize these fast functions can call them, or chose not to. Generally, the presence of vtprotobuf would not even be strictly required, and the same binary would compile fine with and without (albeit slower in the latter case), as calling pattern is typically like https://github.com/vitessio/vitess/blob/40f314c1d052db3fa5a52e5fb2ddd2bddaefde44/go/vt/servenv/grpc_codec.go#L40-L49. Additionally, its actively supported by Vitess (CNCF graduation project).

There are a few risks:

  • Binary size: importing every proto results in a 4MB increase in binary size. Not its quite uncommon to import every proto in the repo, so typically this cost is much smaller
    Mitigation, if needed: expose this only under a build tag, so a use can opt in (or opt out)
  • Increase dependency that could, in theory, start to decay.
    Mitigation: dropping vtprotobuf would be compatible, as noted above (just have a large performance impact)

However, the gains are tremendous. Below shows benchmarks from our control plan. Note these are not really synthetic microbenchmarks, but rather exercise our entire application; the CPU profiles of the benchmarks very closely match production profiles:

name                                              old time/op        new time/op        delta
RouteGeneration/gateways-shared-48                      77.9ms ± 2%        51.8ms ± 5%  -33.46%  (p=0.008 n=5+5)
RouteGeneration/knative-gateway-48                      3.94ms ± 1%        2.92ms ± 2%  -25.93%  (p=0.008 n=5+5)
ClusterGeneration/gateways-48                           10.7ms ± 1%         5.6ms ± 2%  -47.69%  (p=0.008 n=5+5)
ClusterGeneration/gateways-shared-48                    10.7ms ± 1%         5.6ms ± 1%  -47.21%  (p=0.008 n=5+5)
ClusterGeneration/knative-gateway-48                    24.0µs ± 1%        13.2µs ± 1%  -45.14%  (p=0.008 n=5+5)
ClusterGeneration/http-48                               1.57ms ± 1%        0.86ms ± 1%  -45.42%  (p=0.008 n=5+5)
ListenerGeneration/gateways-48                          15.4ms ± 2%        10.0ms ± 3%  -35.43%  (p=0.008 n=5+5)
ListenerGeneration/gateways-shared-48                   35.1µs ± 3%        24.7µs ± 1%  -29.49%  (p=0.008 n=5+5)
ListenerGeneration/knative-gateway-48                   31.2µs ± 2%        21.1µs ± 2%  -32.22%  (p=0.008 n=5+5)
ListenerGeneration/http-48                               125µs ± 1%          94µs ± 0%  -24.94%  (p=0.008 n=5+5)
EndpointGeneration/1/100-48                              958µs ± 1%         628µs ± 1%  -34.41%  (p=0.008 n=5+5)
EndpointGeneration/10/10-48                              597µs ± 2%         338µs ± 1%  -43.39%  (p=0.008 n=5+5)
EndpointGeneration/100/10-48                            5.64ms ± 2%        3.18ms ± 3%  -43.69%  (p=0.008 n=5+5)
EndpointGeneration/1000/1-48                            5.67ms ± 1%        3.21ms ± 3%  -43.34%  (p=0.008 n=5+5)

name                                              old alloc/op       new alloc/op       delta
RouteGeneration/gateways-shared-48                      37.9MB ± 0%        36.0MB ± 0%   -5.07%  (p=0.008 n=5+5)
RouteGeneration/knative-gateway-48                      1.81MB ± 0%        1.74MB ± 0%   -3.56%  (p=0.008 n=5+5)
ClusterGeneration/gateways-48                           4.94MB ± 0%        4.38MB ± 0%  -11.35%  (p=0.008 n=5+5)
ClusterGeneration/gateways-shared-48                    4.94MB ± 0%        4.38MB ± 0%  -11.35%  (p=0.008 n=5+5)
ClusterGeneration/knative-gateway-48                    11.3kB ± 0%        10.2kB ± 0%   -9.90%  (p=0.008 n=5+5)
ClusterGeneration/http-48                                700kB ± 0%         648kB ± 0%   -7.47%  (p=0.008 n=5+5)
ListenerGeneration/gateways-48                          8.53MB ± 0%        8.53MB ± 0%   -0.00%  (p=0.016 n=4+5)
ListenerGeneration/gateways-shared-48                   19.1kB ± 0%        19.0kB ± 0%   -0.67%  (p=0.008 n=5+5)
ListenerGeneration/knative-gateway-48                   17.2kB ± 0%        17.1kB ± 0%     ~     (p=0.079 n=4+5)
ListenerGeneration/http-48                              62.6kB ± 0%        62.6kB ± 0%     ~     (p=0.167 n=5+5)
EndpointGeneration/1/100-48                              397kB ± 0%         361kB ± 0%   -9.26%  (p=0.008 n=5+5)
EndpointGeneration/10/10-48                              269kB ± 0%         232kB ± 0%  -13.71%  (p=0.008 n=5+5)
EndpointGeneration/100/10-48                            2.53MB ± 0%        2.16MB ± 0%  -14.54%  (p=0.008 n=5+5)
EndpointGeneration/1000/1-48                            2.53MB ± 0%        2.16MB ± 0%  -14.55%  (p=0.008 n=5+5)

name                                              old allocs/op      new allocs/op      delta
RouteGeneration/gateways-shared-48                        490k ± 0%          394k ± 0%  -19.58%  (p=0.008 n=5+5)
RouteGeneration/knative-gateway-48                       24.9k ± 0%         21.7k ± 0%  -12.93%  (p=0.000 n=5+4)
ClusterGeneration/gateways-48                            75.2k ± 0%         53.2k ± 0%  -29.31%  (p=0.029 n=4+4)
ClusterGeneration/gateways-shared-48                     75.2k ± 0%         53.2k ± 0%  -29.31%  (p=0.008 n=5+5)
ClusterGeneration/knative-gateway-48                       166 ± 0%           122 ± 0%  -26.51%  (p=0.008 n=5+5)
ClusterGeneration/http-48                                9.23k ± 0%         6.98k ± 0%  -24.36%  (p=0.008 n=5+5)
ListenerGeneration/gateways-48                           66.9k ± 0%         66.9k ± 0%   -0.00%  (p=0.008 n=5+5)
ListenerGeneration/gateways-shared-48                      180 ± 0%           178 ± 0%   -1.11%  (p=0.008 n=5+5)
ListenerGeneration/knative-gateway-48                      163 ± 0%           161 ± 0%   -1.23%  (p=0.008 n=5+5)
ListenerGeneration/http-48                                 530 ± 0%           530 ± 0%     ~     (all equal)
EndpointGeneration/1/100-48                              7.91k ± 0%         6.31k ± 0%  -20.23%  (p=0.008 n=5+5)
EndpointGeneration/10/10-48                              5.37k ± 0%         3.77k ± 0%  -29.81%  (p=0.008 n=5+5)
EndpointGeneration/100/10-48                             49.6k ± 0%         33.6k ± 0%  -32.27%  (p=0.008 n=5+5)
EndpointGeneration/1000/1-48                             49.1k ± 0%         33.1k ± 0%  -32.60%  (p=0.008 n=5+5)

Overall we see a 40% improvement across the board, which is too good to ignore IMO

A branch of go-control-plane with these changes can be found here: https://github.com/howardjohn/go-control-plane/tree/exp/vt-2


One alternative idea I had was to simply not use go-control-plane and build our own codegen. This is not viable, however. Because go-control-plane has creeped into other core projects, such as grpc and prometheus, its ~impossible to import those packages and our own alternative generated protos. Additionally, we cannot just store the vtproto parts out of band, as methods cannot be attached to structs in other packages. As a result, our only option (AFAIK) is to include them in this repo directly.

@howardjohn
Copy link
Contributor Author

Did a few discussions with folks about various risks and tradeoffs

The biggest concerns are around a gogo 2.0 that can't be ripped out easily.
This library is fundamentally different, and intentionally avoids this issue.
Calling looks like this: https://github.com/vitessio/vitess/blob/40f314c1d052db3fa5a52e5fb2ddd2bddaefde44/go/vt/servenv/grpc_codec.go#L40-L49, so there is no code changes required if its removed later. Theoretically you could break yourself, but you would have to use the library in a poor way to do so.
The original proto Go struct is unmodified.

From the internal protobuf side, I am not sure how much of the details I can discuss publicly, but there was not concerns in adoption of vtprotobuf.

howardjohn added a commit to howardjohn/istio that referenced this issue Dec 4, 2023
Envoy tracking issue:
envoyproxy/go-control-plane#824

This will be a NOP until go-control-plane completes ^. Once that
happens, we will get the optimizations for free
istio-testing pushed a commit to istio/istio that referenced this issue Dec 5, 2023
Envoy tracking issue:
envoyproxy/go-control-plane#824

This will be a NOP until go-control-plane completes ^. Once that
happens, we will get the optimizations for free
mikemorris pushed a commit to mikemorris/istio that referenced this issue Dec 6, 2023
Envoy tracking issue:
envoyproxy/go-control-plane#824

This will be a NOP until go-control-plane completes ^. Once that
happens, we will get the optimizations for free
@dfawley
Copy link
Member

dfawley commented Dec 12, 2023

Calling looks like this: vitessio/vitess@40f314c/go/vt/servenv/grpc_codec.go#L40-L49, so there is no code changes

That's what calling code should look like. And what happens when an important project writes the equivalent of:

func (vtprotoCodec) Marshal(v any) ([]byte, error) {
	return v.(vtprotoMessage).MarshalVT() // panics if the proto doesn't support vtprotoMessage
}

Removing vtproto would be a breaking change.

@howardjohn
Copy link
Contributor Author

Calling looks like this: vitessio/vitess@40f314c/go/vt/servenv/grpc_codec.go#L40-L49, so there is no code changes

That's what calling code should look like. And what happens when an important project writes the equivalent of:

func (vtprotoCodec) Marshal(v any) ([]byte, error) {
	return v.(vtprotoMessage).MarshalVT() // panics if the proto doesn't support vtprotoMessage
}

Removing vtproto would be a breaking change.

You can go down this rabbit hole with any API really. Someone can always use reflect or unsafe to pull out data from private fields, or other crazy shenanigans.

If someone goes out of there way to shoot themselves in the foot, I don't see any issue with breaking them -- it would be far less of a breaking change than go-control-plane has made numerous times over the years as well (not reason to do it again, but still).

Also see build tag discussion in envoyproxy/envoy#31172 (comment)

@dfawley
Copy link
Member

dfawley commented Dec 12, 2023

These aren't private fields, and my example does not use reflection or unsafe. These are exported methods, and removing them is going to be seen as less "okay".

The issue with "just break people who do bad things" is: when you work for a big company (e.g. Google) and someone imports the project doing bad things into your codebase, you might not be able to upgrade your thing without fixing the people that did the bad things first.

If this can go in with a build flag that is opt-in instead of opt-out, that would probably remove all of my concerns.

@howardjohn
Copy link
Contributor Author

If this can go in with a build flag that is opt-in instead of opt-out, that would probably remove all of my concerns.

This is my plan, need to merge planetscale/vtprotobuf#122 first though

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 11, 2024
howardjohn added a commit to howardjohn/envoy that referenced this issue Jan 16, 2024
htuch pushed a commit to envoyproxy/envoy that referenced this issue Jan 17, 2024
See envoyproxy/go-control-plane#824 for more information

This PR adds the vtprotobuf protoc plugin for Go. This works on top of the existing protoc-gen-go, to add optimized marshal functions that callers can opt in to using. This is not like gogo, which was a very invasive change -- everything is layered and opt-in. See issue for benchmarks, etc.

Additionally, to avoid possible binary size increase, the entire new code is protected under a go build tag. Users will need to opt-in at build time (-tags=vtprotobuf). By default, there is no impact for users at all.

Risk Level: Low - only additional opt-in code
Testing: Manually tested in Istio codebase

Signed-off-by: John Howard <[email protected]>
update-envoy bot added a commit to envoyproxy/data-plane-api that referenced this issue Jan 17, 2024
See envoyproxy/go-control-plane#824 for more information

This PR adds the vtprotobuf protoc plugin for Go. This works on top of the existing protoc-gen-go, to add optimized marshal functions that callers can opt in to using. This is not like gogo, which was a very invasive change -- everything is layered and opt-in. See issue for benchmarks, etc.

Additionally, to avoid possible binary size increase, the entire new code is protected under a go build tag. Users will need to opt-in at build time (-tags=vtprotobuf). By default, there is no impact for users at all.

Risk Level: Low - only additional opt-in code
Testing: Manually tested in Istio codebase

Signed-off-by: John Howard <[email protected]>

Mirrored from https://github.com/envoyproxy/envoy @ 21b52ba73d8ebbb51834d529a68f55ea2ec5e614
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants