Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds gRPC endpoint /zipkin.proto3.SpanService/Report #2328

Merged
merged 2 commits into from
Apr 25, 2019

Conversation

ewhauser
Copy link
Contributor

@ewhauser ewhauser commented Dec 16, 2018

This adds an opt-in gRPC endpoint for reporting spans. A request to the
gRPC /zipkin.proto3.SpanService/Report service is the same proto used
with our POST /api/v2/spans, Content-Type: application/x-protobuf
endpoint: zipkin.proto3.ListOfSpans.

This is currently disabled by default, as the feature is experimental.
Enable it like so:

$ COLLECTOR_GRPC_ENABLED=true java -jar zipkin.jar

Under the scenes the runtime is the same as the POST endpoint (Armeria),
and uses the same port (9411).


former description:

This PR is an offshoot of a discussion in openzipkin/zipkin-api#57 . There are a number of issues that need to be addressed before consideration of merge:

  • Finalize the API and merge it into zipkin-api
  • Find a zero-duplication and zero-copy approach for the deserialization of spans on the server side
  • Add additional configuration options for configuring the server (i.e. SSL)
  • Integration test with GCP
  • Performance testing

Also, there is still some discussion about whether the project as a whole wants to support a gRPC endpoint in zipkin-server due to potential bloat. See the discussion in zipkin-api#57 for background.

@codefromthecrypt
Copy link
Member

curios... maybe @trustin or @anuraaga would know.. is there a way to run both normal http and grpc's http on the same port? Currently, we use undertow underneath, but we could consider changing that.

@anuraaga
Copy link
Contributor

If you use armeria :) Upstream gRPC doesn't support it yet I think, or even if it did it would only be HTTP/2, not HTTP/1.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 16, 2018 via email

@anuraaga
Copy link
Contributor

I guess the talking points would be that armeria still isn't GA and has more dependencies. But agree that more ports is annoying and a big reason I love armeria :)

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 16, 2018 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 16, 2018 via email

@ewhauser
Copy link
Contributor Author

I can look into armeria as an alternative.

The biggest trick here is going to be in the serializer though. We have to use Zipkin’s Proto3Codec instead of relying on the server implementation to deserialize the span payload. I’m not sue if any of the server implementations support the concept of lazy deserialization. If they don’t, then I’ll have to plug into the serialization pipeline.

@anuraaga
Copy link
Contributor

For the zero copy aspect, it might be simplest to define the client API in zipkin-api as is but use an internal proto here that replaces Span with bytes. Embedded messages and strings are encoded exactly the same in proto.

https://developers.google.com/protocol-buffers/docs/encoding

@anuraaga
Copy link
Contributor

Also I guess you would want to use the new support for plugging in as a spring webflux server - I don't know what that means since I haven't used spring recently but probably relevant.

line/armeria@70785f8

And if you use this unsafe option you'd be able to have real zero-copy with no heap arrays involved until the zipkin2.Span

https://github.com/line/armeria/blob/master/grpc/src/main/java/com/linecorp/armeria/server/grpc/GrpcServiceBuilder.java#L181

@ewhauser
Copy link
Contributor Author

There are a few more dependencies with armeria:

[INFO] +- com.linecorp.armeria:armeria-grpc-shaded:jar:0.75.0:compile
[INFO] |  +- com.linecorp.armeria:armeria-shaded:jar:0.75.0:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.7:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-core:jar:2.9.7:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.7:compile
[INFO] |  |  +- io.micrometer:micrometer-core:jar:1.1.0:compile
[INFO] |  |  |  +- org.hdrhistogram:HdrHistogram:jar:2.1.9:compile
[INFO] |  |  |  \- org.latencyutils:LatencyUtils:jar:2.0.3:compile
[INFO] |  |  +- io.netty:netty-codec-haproxy:jar:4.1.31.Final:compile
[INFO] |  |  |  \- io.netty:netty-codec:jar:4.1.31.Final:compile
[INFO] |  |  +- io.netty:netty-codec-http2:jar:4.1.31.Final:compile
[INFO] |  |  |  +- io.netty:netty-codec-http:jar:4.1.31.Final:compile
[INFO] |  |  |  \- io.netty:netty-handler:jar:4.1.31.Final:compile
[INFO] |  |  +- io.netty:netty-resolver-dns:jar:4.1.31.Final:compile
[INFO] |  |  |  +- io.netty:netty-resolver:jar:4.1.31.Final:compile
[INFO] |  |  |  \- io.netty:netty-codec-dns:jar:4.1.31.Final:compile
[INFO] |  |  +- io.netty:netty-tcnative-boringssl-static:jar:2.0.19.Final:compile
[INFO] |  |  +- io.netty:netty-transport:jar:4.1.31.Final:compile
[INFO] |  |  |  \- io.netty:netty-buffer:jar:4.1.31.Final:compile
[INFO] |  |  +- io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.31.Final:compile
[INFO] |  |  |  +- io.netty:netty-common:jar:4.1.31.Final:compile
[INFO] |  |  |  \- io.netty:netty-transport-native-unix-common:jar:4.1.31.Final:compile
[INFO] |  |  +- io.netty:netty-transport-native-unix-common:jar:linux-x86_64:4.1.31.Final:compile
[INFO] |  |  \- org.reactivestreams:reactive-streams:jar:1.0.2:compile
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] |  +- org.curioswitch.curiostack:protobuf-jackson:jar:0.3.0:compile
[INFO] |  |  +- javax.annotation:javax.annotation-api:jar:1.3.1:runtime
[INFO] |  |  +- com.google.protobuf:protobuf-java-util:jar:3.6.1:runtime

Neither netty or jackson being shaded is less than ideal. Appears that armeria is choosing not to shade netty in the future - line/armeria#705.

On the plus side, I believe you can run HTTP/1, gRPC, and gRPC-Web all from the same server instance. GRPC-Web only supports unary RPC at present, so would need to provide a unary method for sending spans as well (or alternatively drop streaming support).

@ewhauser
Copy link
Contributor Author

@anuraaga - For implementing the Span payload directly as a ByteString, I was looking into implementing io.grpc.BindableService directly and replacing the input parameter. I haven't tried that yet though so if you have a pointer or example then I'd be happy to take it.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 16, 2018 via email

@anuraaga
Copy link
Contributor

I was thinking this would be simplest by having

Public API proto, used by clients

message WriteSpansRequest {
  repeated zipkin.proto3.Span span = 1;
}

service ZipkinService {
  rpc WriteSpans (stream WriteSpansRequest) returns (whatever);
}

while the server actually implements a different, non-exposed proto

message WriteSpansRequest {
  repeated bytes span = 1;
}

service ZipkinService {
  rpc WriteSpans (stream WriteSpansRequest) returns (whatever);
}

The protos are compatible so the clients get the advantage of the generated code and easy-to-populate structs while the server would only see the ByteString and decode straight into a Java span.

It has the downside of duplicate protos but it's probably a lot less complex than a custom BindableService

@ewhauser
Copy link
Contributor Author

Got it. Yes, that is a definitely easier and a nice trick. Will give it a shot. Thanks!

@ewhauser
Copy link
Contributor Author

ewhauser commented Dec 16, 2018

So, @anuraaga's trick doesn't work - at least with gRPC Java. The issue is that the marshaller will have already copied the input stream into a ByteString prior to getting access to it.

So I took the approach of implementing a custom BindableService. This has the added benefit of removing all dependencies on Protocol Buffers entirely. Here are the dependencies introduced now:

[INFO] +- io.grpc:grpc-netty-shaded:jar:1.17.1:runtime
[INFO] |  \- io.grpc:grpc-core:jar:1.17.1:compile (version selected from constraint [1.17.1,1.17.1])
[INFO] |     +- io.grpc:grpc-context:jar:1.17.1:compile
[INFO] |     +- com.google.code.gson:gson:jar:2.7:compile
[INFO] |     +- com.google.errorprone:error_prone_annotations:jar:2.2.0:compile
[INFO] |     +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] |     +- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[INFO] |     +- com.google.guava:guava:jar:26.0-android:compile
[INFO] |     |  +- org.checkerframework:checker-compat-qual:jar:2.5.2:compile
[INFO] |     |  \- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] |     +- io.opencensus:opencensus-api:jar:0.17.0:compile
[INFO] |     \- io.opencensus:opencensus-contrib-grpc-metrics:jar:0.17.0:compile
[INFO] +- io.grpc:grpc-stub:jar:1.17.1:compile

So this ends up being much less of a dependency burden than armeria - no netty, protobufs - with the downside of having the collector run on a different port. We can keep this version of gRPC in sync with the version that is being used by zipkin-gcp to avoid any issues.

However, armeria does have the advantage of everything being on the same port and it would support gRPC-Web out of the box without a proxy. Would switching to armeria mean that it would eliminate undertow and just use it for everything? If so, we shoud probably find a path to finalizing the gRPC API in this PR and moving forward with this experimental version on a different port. A separate issue could be opened for moving to armeria - given the size of that change, I would want to make sure there was plenty of support for it.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 16, 2018 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 16, 2018 via email

@ewhauser
Copy link
Contributor Author

I started this whole mess so I'm fine rolling up my sleeves on the switch. Just want to make sure there is support behind it.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 17, 2018 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 17, 2018 via email

@codefromthecrypt
Copy link
Member

here's the issue for armeria #2331

@trustin
Copy link

trustin commented Dec 17, 2018

@ewhauser wrote:

I would want to make sure there was plenty of support for it.

The Armeria project maintainers including LINE employees like me (@hyangtack and @minwoox as well) would be happy to support Zipkin community for adoption of Armeria and its sustainable development. I believe you don't need to worry too much about the future of Armeria given that it's being used in various critical parts of LINE services.

@anuraaga
Copy link
Contributor

For the copying issue, just to clarify, using the proto approach makes

ByteBuf -> zipkin.proto3.Span -> byte[] -> zipkin2.Span become

ByteBuf -> byte[] -> zipkin2.Span

and then using Armeria's unsafe option can make it become

ByteBuf -> zipkin2.Span

Proto3Codec needs to be updated to accept ByteBuffer, which is pretty simple change I think. On the flip side, I think it's not practical for it to parse the InputStream directly since doing that efficiently would probably require using the KnownLength class inside of Proto3Codec which shouldn't have gRPC dependencies.

Of course it depends on the decision of moving to armeria, but if we do, I'd still recommend using the proto trick + unsafe option.

@ewhauser
Copy link
Contributor Author

ewhauser commented Dec 17, 2018 via email

@ewhauser
Copy link
Contributor Author

ewhauser commented Dec 17, 2018 via email

@codefromthecrypt
Copy link
Member

@ewhauser I think this is ready to merge. We aren't releasing right away as the project is in the middle of an ASF move. However, the remaining task of perf testing will be easier once merged as people can build off master. The feature is currently disabled under a flag.

I don't expect perf testing to be surprising as this is not much different than vanilla armeria http, but yeah once we have a grpc sender we can hook it up and compare.

@codefromthecrypt
Copy link
Member

The zero copy thing is in a different change, btw, it isn't grpc specific #2435

@shakuzen
Copy link
Member

However, the remaining task of perf testing will be easier once merged as people can build off master.

As long as this PR is up-to-date with master, maybe this is a good opportunity to try out Jitpack for testing pre-merge?

@codefromthecrypt
Copy link
Member

@shakuzen sure.. anyone want to translate this into eric's branch name? "https://jitpack.io/com/github/apache/incubator-zipkin/zipkin-server/2.12.9/zipkin-server-2.12.9-exec.jar"

@codefromthecrypt
Copy link
Member

ps this is up to date with master which means the maven package is org.apache.zipkin

(with much rejoicin)

@@ -98,7 +98,10 @@ public void addViewControllers(ViewControllerRegistry registry) {
// could split the collector's CORS policy into a different property, still allowing POST
// with content-type by default.
.allowRequestMethods(HttpMethod.GET, HttpMethod.POST)
.allowRequestHeaders(HttpHeaderNames.CONTENT_TYPE);
.allowRequestHeaders(HttpHeaderNames.CONTENT_TYPE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trustin fyi in armeria repo I mentioned something about different modules needing to collaborate on CORS config. since this is now baked into zipkin server directly now, there is minimal impact to us.

@codefromthecrypt
Copy link
Member

oops on rebase lost update to zipkin proto 0.2.0

@codefromthecrypt
Copy link
Member

@shakuzen I just now realize how extremely complex our test setup is vs just using wire (which doesn't require a shell out to protoc). I raised a small revision to our proto as I think now's the time openzipkin/zipkin-api#64

@codefromthecrypt
Copy link
Member

will rebase once the following are in:

@codefromthecrypt
Copy link
Member

ok now using a JVM impl square/wire to compile the protos needed for the integration tests. maybe jitpack works now

@codefromthecrypt
Copy link
Member

I broke the build settings for integration test (cli only, as intellij is fine).. will fix. here's the command to get to the same fail as travis

$ ./mvnw clean verify -pl zipkin-server -o -Dinvoker.skip=true

If you need to deploy a collector-only server, you can disable other http endpoints like so:

```bash
GRPC_COLLECTOR_ENABLED=true HTTP_COLLECTOR_ENABLED=false QUERY_ENABLED=false java -jar zipkin.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps I'm just now noticing our env convention is backwards.

it might be ok to keep it, to be in parity with http. or we can add COLLECTOR_HTTP_ENABLED as a fallback instead of propagating the backwards convention to grpc

@codefromthecrypt
Copy link
Member

those who want to test other language reporters against this before release can do the following:

$ curl -sSL https://jitpack.io/com/github/ewhauser/zipkin/zipkin-server/experimental-grpc-1.5.0-g94ac6cf-1164/zipkin-server-experimental-grpc-1.5.0-g94ac6cf-1164-exec.jar > zipkin.jar
$ GRPC_COLLECTOR_ENABLED=true java -jar zipkin.jar 

@codefromthecrypt
Copy link
Member

added a commit to unhook this from snapshots. will merge on green

@codefromthecrypt codefromthecrypt changed the title experimental grpc server for span collection Adds gRPC endpoint /zipkin.proto3.SpanService/Report Apr 25, 2019
@codefromthecrypt
Copy link
Member

updated description

This adds an opt-in gRPC endpoint for reporting spans. A request to the
gRPC `/zipkin.proto3.SpanService/Report` service is the same proto used
with our `POST /api/v2/spans`, `Content-Type: application/x-protobuf`
endpoint: `zipkin.proto3.ListOfSpans`.

This is currently disabled by default, as the feature is experimental.
Enable it like so:
```bash
$ COLLECTOR_GRPC_ENABLED=true java -jar zipkin.jar
```

Under the scenes the runtime is the same as the POST endpoint (Armeria),
and uses the same port (9411).
@codefromthecrypt codefromthecrypt merged commit fde0aea into openzipkin:master Apr 25, 2019
codefromthecrypt pushed a commit that referenced this pull request Apr 25, 2019
This adds an opt-in gRPC endpoint for reporting spans. A request to the
gRPC `/zipkin.proto3.SpanService/Report` service is the same proto used
with our `POST /api/v2/spans`, `Content-Type: application/x-protobuf`
endpoint: `zipkin.proto3.ListOfSpans`.

This is currently disabled by default, as the feature is experimental.
Enable it like so:
```bash
$ COLLECTOR_GRPC_ENABLED=true java -jar zipkin.jar
```

Under the scenes the runtime is the same as the POST endpoint (Armeria),
and uses the same port (9411).
abesto pushed a commit to abesto/zipkin that referenced this pull request Sep 10, 2019
This adds an opt-in gRPC endpoint for reporting spans. A request to the
gRPC `/zipkin.proto3.SpanService/Report` service is the same proto used
with our `POST /api/v2/spans`, `Content-Type: application/x-protobuf`
endpoint: `zipkin.proto3.ListOfSpans`.

This is currently disabled by default, as the feature is experimental.
Enable it like so:
```bash
$ COLLECTOR_GRPC_ENABLED=true java -jar zipkin.jar
```

Under the scenes the runtime is the same as the POST endpoint (Armeria),
and uses the same port (9411).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants