Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation.AspNetCore for gRPC services omits ALL rpc.* attributes under certain conditions #1740

Closed
pcwiese opened this issue Jan 29, 2021 · 5 comments · Fixed by #1879
Labels
bug Something isn't working

Comments

@pcwiese
Copy link
Contributor

pcwiese commented Jan 29, 2021

Bug Report

NuGet package(s):

  • OpenTelemetry.Instrumentation.AspNetCore 1.0.0-rc.1 NuGet package (and tip of tree)

Runtimes:

  • netcoreapp3.1

Symptom

rpc.* attributes required for RPC Spans are not added to the created gRPC server Span under particular circumstances. All I see are http.* attributes. This happens when using a composite text map propagator made up of B3 and W3C in that order (B3 takes precedence if there). We have nginx and linkerd services sitting in front of these netcore gRPC services which understand and augment traces via the B3 headers before passing through to the gRPC service. Existing traceparent headers are treated as opaque data and forwarded unmodified as intended. The gRPC instrumentation correctly extracts the context from the B3 headers but still fails to output the rpc attributes.

This is happening because a new Activity is created in this block:

if (!this.hostingSupportsW3C || !(textMapPropagator is TraceContextPropagator))
with a name (ActivityCreatedByHttpInListener) that doesn't satisfy a check in the dotnet-grpc code which is responsible for adding a tag on the Activity called grpc.method. It looks specifically for an Activity named Microsoft.AspNetCore.Hosting.HttpRequestIn. See https://github.com/grpc/grpc-dotnet/blob/76ae64d3b0e28772bc4d97c0d8a146b3609b465d/src/Grpc.AspNetCore.Server/Internal/HttpContextServerCallContext.cs#L410

The otel instrumentation looks for the presence of that grpc.method tag before adding the rpc.* tags.

What is the expected behavior?

rpc.* attributes are present according to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/rpc.md

Repro

Add a composite propagator to the Examples.Grpc.Service. See the last commit on my test branch: pcwiese@6cc33a1

Start the Examples.Grpc.Service and hit it with a grpcurl request similar to this one:

grpcurl.exe -proto "Z:\src\opentelemetry-dotnet\examples\GrpcService\Protos\greet.proto" -import-path "Z:\src\opentelemetry-dotnet\examples\GrpcService\Protos" -H "traceparent: 00-120dc44db5b736468afb112197b0dbd3-5dfbdf27ec544544-01" -H "x-b3-traceid: 120dc44db5b736468afb112197b0dbd3" -H "x-b3-spanid: b0966f651b9e0126" -H "x-b3-sampled: 1" localhost:44335 greet.Greeter/SayHello

The trace ids match between the traceparent header and B3 headers but the span ids do not. When you make the request you should see the rpc.* attributes outputted from the service but they aren't there.

Possible fix

Given that the Grpc.AspNetCore.Server NuGet package requires the Activity to be a certain name, we could simply create the Activity as we do now in the instrumentation library, but use the incoming Activity's OperationName instead of ActivityCreatedByHttpInListener. I tried this out locally and it does work but I'm sure what the implication of that is elsewhere.

@pcwiese pcwiese added the bug Something isn't working label Jan 29, 2021
@pcwiese pcwiese changed the title Instrumentation.AspNetCore for gRPC servers omits rpc.* attributes under certain conditions Instrumentation.AspNetCore for gRPC services omits ALL rpc.* attributes under certain conditions Jan 29, 2021
@cijothomas
Copy link
Member

@utpilla Can you investigate this?

@utpilla
Copy link
Contributor

utpilla commented Mar 8, 2021

I have created a PR #1876 to add a failing unit test for this scenario.

I see these four options to fix this issue. I have created a Draft PR for three of them:

  1. Parse the payload to get values for grpc.method and grpc.status_code and add these tags to the activity in the listener. Draft PR Update HttpInListener to add gRPC tags - By retrieving gRPC tag values from payload #1877 (we need to know the best way to obtain grpc.status_code from the payload)
  2. Activities created by framework and instrumentation store reference to each other through custom properties. We don't update Activity.Current so that the framework can add the gRPC tags and we can retrieve these in the OnStopActivity method of the listener to add these tags to the activity created by instrumenation. Draft PR Update HttpInListener to add gRPC tags - By storing the references to the activities created by framework and instrumentation #1878
  3. Instrumentation creates the new activity with the OperationName (Microsoft.AspNetCore.Hosting.HttpRequestIn) used by the framework. This leads to the framework adding gRPC tags to the activity created by instrumentation. Draft PR Update HttpInListener to add gRPC tags - By creating a new activity with the OperationName used by the framework #1879 (This breaks the existing AspNetCore instrumentation unit tests which check for a differnt OperationName for activities creatd by instrumentation.
  4. Ask the framework team to update their logic to not check for OperationName before adding gRPC tags.

@pcwiese @cijothomas Thoughts? Which of these options seems the best to go forward with?

@alanwest
Copy link
Member

alanwest commented Mar 8, 2021

I think I like option 3 the best because it continues to rely on the same side-effect of the library adding tags to a specific Activity.

I'm intrigued by option 1, because it seems to move away from relying on the library's side-effect. Though, I haven't studied this enough to know if relying on the request's ContentType and Path will be reliable. Also, as you've noted, still not sure how to handle status code.

Side note, I'm a bit embarrassed to say, but I'm still a little unclear on why we create sibling Activity in some circumstances...

@cijothomas
Copy link
Member

why we create sibling Activity in some circumstances...

These libraries (asp.net core etc), creates Activities and sets its parent as the ones extracted from traceparent header (hardcoded).
If user has configured the app to use a diff propagator,( say B3), then the parent must be extracted from B3 headers, not traceparent headers. Since the library does not abide by this rule, its activity is incorrectly parented. So the instrumentation creates a new activity, with the right parent (extracted from the user-configured propagators).

@pcwiese
Copy link
Contributor Author

pcwiese commented Mar 8, 2021

I think I like option 3 the best because it continues to rely on the same side-effect of the library adding tags to a specific Activity.

I'm intrigued by option 1, because it seems to move away from relying on the library's side-effect. Though, I haven't studied this enough to know if relying on the request's ContentType and Path will be reliable. Also, as you've noted, still not sure how to handle status code.

Side note, I'm a bit embarrassed to say, but I'm still a little unclear on why we create sibling Activity in some circumstances...

I would agree that Option 3 also sounds the most appealing and it is the path I took to patch a private version of this assembly that our team is using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment