Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire up DiagnosticsHandler #456

Merged
merged 22 commits into from
Nov 12, 2020

Conversation

jmezach
Copy link
Contributor

@jmezach jmezach commented Oct 7, 2020

As discussed in #292 I've wired up the DiagnosticsHandler in the ProxyHttpClientFactory as a workaround to make sure that the trace context is propagated to the downstream services. I also had to bring in the DiagnosticsHandlerLoggingStrings class to ensure that it compiles.

I've already seen it working on my local development machine in that I receive a traceparent header in the downstream service. I'm a little bit hesitant to put this into production to see if it solves our problem with distributed tracing (using New Relic) so I figured I might as well create a draft PR here first to get some input on things that could be improved.

@Tratcher
Copy link
Member

Tratcher commented Oct 7, 2020

@MihaZupan can you supervise this PR?

@Tratcher
Copy link
Member

Tratcher commented Oct 7, 2020

@jmezach are there any tests you could bring over or add for this?

@jmezach
Copy link
Contributor Author

jmezach commented Oct 9, 2020

@Tratcher I did find this test class in dotnet/runtime but that class is abstract. It looks like there's the SocketsHttpHandlerTest class which inherits from that base class but surely we don't want to include all those tests in here?

@MihaZupan I also found out that the DiagnosticsHandler I based this on was a slightly older version from dotnet/corefx rather than dotnet/runtime. The latter seems to be slightly different in its implementation. Should I base this implementation on that?

@Kahbazi
Copy link
Collaborator

Kahbazi commented Oct 9, 2020

Now that DiagnosticHandler is adding traceparent and tracestate header, what should happened to a request if it already has these headers?
Is it ok that the proxy request has two traceparent and tracestate headers?
Should the user add a transform to prevent the original traceparent to be proxied or it can be removed in DiagnosticHandler?

@jmezach
Copy link
Contributor Author

jmezach commented Oct 9, 2020

@Kahbazi That's an interesting question. I did some testing and it seems that if we send a traceparent header along with the request to YARP it will just send that exact same header to the downstream service and it adds a Request-Id header (which includes the value of traceparent among other things).

I'm not too sure about the specifics of the W3C Trace Context specification, but this behaviour does make sense to me. Essentially the reverse proxy is just another service in the chain if you're looking at distributed tracing. Were you expecting different behaviour?

@Kahbazi
Copy link
Collaborator

Kahbazi commented Oct 9, 2020

Oh my bad! I looked at the code and it's only sending the traceparent if it doesn't exists, so we won't get multiple headers.

if (!request.Headers.Contains(DiagnosticsHandlerLoggingStrings.TraceParentHeaderName))

But I'm wondering how's the distributed tracing when the request have a traceparent and YARP is sending that instead of getting it from current Activity.

If I have ZipKin distributed tracing I expect to see the charts like this. Here Backend parent is Proxy and Proxy parent is Client
image
I could be wrong but now I think we will see this which Client is parent for both Backend and Proxy.
image

@jmezach
Copy link
Contributor Author

jmezach commented Oct 9, 2020

@Kahbazi Yeah, you might be on to something there since I'm seeing the traceparent being transmitted as is to the downstream service.

Perhaps @Tratcher and @MihaZupan could chime in here. Is this what we're expecting to see here? I'm getting a feeling that we might have to do somethings differently in the case of YARP.

@jmezach
Copy link
Contributor Author

jmezach commented Oct 9, 2020

@Kahbazi I've run a couple of experiments to validate your theory. I currently have an API gateway setup using YARP and I've recompiled it using the changes in this PR. I then setup OpenTelemetry for both the API gateway and the downstream service which is an ASP.NET Core API that simply echoes the incoming headers in the response body. I've setup the Jaeger exporter with OpenTelemetry since I couldn't get the Zipkin exporter to work, or at least not on my machine.

If I don't send a traceparent header along with my request to the API gateway (YARP) I'm seeing the following in Jaeger:

Screenshot 2020-10-09 at 20 51 45

When I sent along a traceparent header with a dummy value along with my request to the API gateway (YARP) I'm seeing this:

Screenshot 2020-10-09 at 20 51 58

So yes, it seems that when we sent along a traceparent header with the incoming request to YARP it messes up the distributed tracing to the downsteam service and the chain is broken.

@jmezach
Copy link
Contributor Author

jmezach commented Oct 10, 2020

@Kahbazi @MihaZupan @Tratcher I've added an additional check for the currentActivity.ParentId. When that is set we remove the existing traceparent header before setting the new one. Not entirely sure if that is correct, but at least now I'm seeing all spans show up in Jaeger even if I send along a traceparent with my initial request to YARP.

{
if (!string.IsNullOrEmpty(currentActivity.ParentId))
{
request.Headers.Remove(DiagnosticsHandlerLoggingStrings.TraceParentHeaderName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a user wants to add its own traceparent header in a Transform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, that could be an issue. Not sure how we could tackle that though. I believe transforms are performed before we get to this handler so how would we know whether the header was added by a transform or if it was in the original request? If the original request didn't have such a header then ParentId on the currentActivity would be null which is why I added the if. But if the original request has such a header and a transform has modified it in some way then I don't think we'll be able to detect that here, unless we add some metadata to the header somehow.

@jmezach
Copy link
Contributor Author

jmezach commented Oct 15, 2020

@MihaZupan Is there something more that I can do here? I think there's a couple of design issues here that we need to tackle to take this further, but would love some input on that.

@jmezach
Copy link
Contributor Author

jmezach commented Oct 26, 2020

@Tratcher Is there anything I can do to get this moving?

@MihaZupan
Copy link
Member

Sorry for the late reply.

Correct me if I'm wrong, but the main thing we want from this is to propagate the trace context via headers, to enable distributed tracing.

Do you actually need the diagnostic events written here?

Looking at all the DiagnosticHandler-related issues I gather there are a few pain points:

  1. It's internal, so can't be used when working with SocketsHttpHandler directly for example Customized SocketsHttpHandler does not support diagnostics tracking dotnet/runtime#31261 (this is fine, this PR is fixing that)
  2. It's not configurable in a reliable and reasonable manner DiagnosticsHandler doesn't support W3C trace context correctly with aspnetcore dotnet/runtime#31862
  3. Because of 2), there are problems when users don't want the hidden internal Activity creation Add an option to HttpClient's default DiagnosticHandler so that it does not create an Activity dotnet/runtime#41072

What I would suggest is to drop the writing of events and creation of activity, and only bring over the InjectHeaders portion of DiagnosticHandler. If anyone requires those events/activities, they can always layer them on top.

The behavior I would expect would be:

  • If the propagation is disabled (which is the default here) OR Activity.Current isn't set, nothing happens
  • If trace headers already exist, they are removed - this is fine as it's an opt-in feature
  • We add the trace headers with info from the Current activity

@jmezach
Copy link
Contributor Author

jmezach commented Nov 4, 2020

@MihaZupan I can totally live with that, makes everything a lot simpler imho. I just did a rebase of my branch on the latest changes and I'll get to work on this. Perhaps we should also rename the handler to ActivityPropagationHandler?

@jmezach jmezach force-pushed the feature/trace-context-propagation branch from c61ce39 to dee121e Compare November 4, 2020 16:14
@jmezach jmezach marked this pull request as ready for review November 4, 2020 20:10
Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, otherwise LGTM

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a test to assert the header modifications, but the behavior LGTM.

@jmezach
Copy link
Contributor Author

jmezach commented Nov 9, 2020

I've been trying to write some tests for the ActivityPropagationHandler, however I'm running into an issue with setting the current activity. I've tried doing using var activity = new Activity(), but that fails to compile stating that Activity must be implicitly convertible to System.IDisposable. As far I know the activity type is disposable, but perhaps that is only on .NET 5 and not in .NET Core 3.1.

I've also tried setting Activity.Current explicitly, but it just returns null after setting the property. I must be doing something wrong ;).

@MihaZupan
Copy link
Member

As far I know the activity type is disposable, but perhaps that is only on .NET 5 and not in .NET Core 3.1.

I believe the IDisposable interface was added in 5.0.

Would you be okay with me pushing test changes to this PR branch directly?

@jmezach
Copy link
Contributor Author

jmezach commented Nov 10, 2020

@MihaZupan Sure, go ahead :).

@MihaZupan
Copy link
Member

Looks like I don't have the perms to push onto your branch directly, could you pick up the commit/changes from my branch: MihaZupan@ff6e65f

@jmezach
Copy link
Contributor Author

jmezach commented Nov 10, 2020

@MihaZupan Done. I had to make one additional change in order for things to build on my dev machine. There was a using statement missing.

@MihaZupan
Copy link
Member

There's a conflict with the master branch now (small conflict in the test file)

@jmezach
Copy link
Contributor Author

jmezach commented Nov 10, 2020

@MihaZupan I fixed the merge conflict.

@jmezach
Copy link
Contributor Author

jmezach commented Nov 11, 2020

@MihaZupan One thing I noticed is that you've put the tests for the handler into Service/Proxy/Infrastructure while the actual handler is in Telemetry. I would expect the tests and corresponding class in the same folder structure, but do you have a preference as to whether the handler should move or the tests?

@Tratcher
Copy link
Member

Tratcher commented Nov 11, 2020

I would expect the tests and corresponding class in the same folder structure

Yes, please move ActivityPropagationHandlerTests.

@jmezach
Copy link
Contributor Author

jmezach commented Nov 12, 2020

@Tratcher Sure thing. Done :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants