Http request instrumentation with Activity (correlation part2) #16393

lmolkova · 2017-02-23T00:10:09Z

This change implements instrumentation for http requests with new System.Diagnostics.Activity API. Activity provides diagnostics context for current operation.

It introduces new events: System.Net.Http.Activity.Start and System.Net.Http.Activity.Stop.
It implements header propagation for operation context. Http Protocol proposal

Instrumentation is documented in the guide included into this PR.

lmolkova · 2017-02-23T00:11:01Z

/cc @vancem @avanderhoorn @nbilling @cwe1ss @karolz-ms @SergeyKanzhelev @brahmnes @stepro

stephentoub · 2017-02-23T03:06:05Z

src/Common/src/System/Net/HttpKnownHeaderNames.TryGetHeaderName.cs

@@ -155,6 +155,7 @@ public static unsafe bool TryGetHeaderName(IntPtr buffer, int length, out string
                        case 'K': potentialHeader = KeepAlive; goto TryMatch;  // [K]eep-Alive
                        case 'S': potentialHeader = SetCookie; goto TryMatch;  // [S]et-Cookie
                        case 'U': potentialHeader = UserAgent; goto TryMatch;  // [U]ser-Agent
+                        case 'R': potentialHeader = RequestId; goto TryMatch;  // [R]equest-Id


It doesn't hurt to have extra headers here, but I'm not familiar with this particular one. Who sends this, and when? Having this here is really only beneficial if it's coming down as part of a response.

This headers are new, declared in this change. They are injected to propagate correlation context below in DiagnosticsHandler. They only make sense in Request and not passed in response.

I noticed TryGetHeaderName function only called for response, but also all known headers including pure Request headers (e.g. User-Agent or Host) are handled there.
I'm fine to remove new ones though.

@stephentoub
Apparently, there is a test that checks all known headers are handled in TryGetHeaderName. Do you know if it's intended test behavior?
I'll return back my changes for now, but I'm open to fix the test if needed.

My primary concern with leaving it in the code is that it's taking up the potential place of another header we might want to add and would conflict with it. Go ahead and leave it for now, but please add a comment next to it indicating that it's unexpected to be needed for responses and could be replaced by another case in the future if necessary.

Actually, sorry, I take that back. I think the right fix is to just put that known request header string on a separate type, i.e. have one type for known response header names and one for known request header names.

stephentoub · 2017-02-23T03:18:12Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                    {
+                        baggageIsEmpty = false;
+                        break;
+                    }


Could be:

bool baggageIsEmpty = !activity.Baggage.Any();

or if you want to avoid LINQ:

using (IEnumerator<KeyValuePair<string, string>> e = activity.Baggage.GetEnumerator()) { baggageIsEmpty = !e.MoveNext(); }

stephentoub · 2017-02-23T03:21:55Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                            baggage.Add(new NameValueHeaderValue(baggageItem.Key, baggageItem.Value).ToString());
+                        }
+                        request.Headers.Add(HttpKnownHeaderNames.CorrelationContext, baggage);
+                    }


Why enumerate Baggage twice? How about replacing this and the baggageIsEmpty stuff above with something like:

using (IEnumerator<KeyValuePair<string, string>> e = activity.Baggage.GetEnumerator()) { if (e.MoveNext()) { var baggage = new List<string>(); do { KeyValuePair<string, string> item = e.Current; baggage.Add(new NameValueHeaderValue(item.Key, item.Value).ToString()); } while (e.MoveNext()); request.Headers.Add(HttpKnownHeaderNames.CorrelationContext, baggage)); } }

stephentoub · 2017-02-23T03:23:49Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                //So we let user log it as exception happened in this 'parent' activity
+                if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ExceptionEventName))
+                {
+                    s_diagnosticListener.Write(DiagnosticsHandlerLoggingStrings.ExceptionEventName, new {Exception = ex});


It's ok that events with the same name have different state associated with them? The earlier change that added use of this exception event sent additional state with the exception, like a timestamp and a request ID.

How about also including the "request" variable in the event properties? Would allow subscribers to log the requested URL with the exception etc.

@stephentoub
As you noticed below, Activity and Request/Response events are mutually exclusive. Activity event consumers will always receive Exception with single exception object. Request/Response subscribers will always receive exception with guid and timestamp.

@cwe1ss

From the logging perspective, URL may be logged in Request event along with Request-Id. And Exception will be logged with the same Id as well. But if subscriber did not instrument the request then in Exception callback there is no context for this particular request.

When exception event is received, Activity.Current holds context for this operation.
Subscriber can add any context to Activity.Current.Tags in the Request event and use them later in Exception and Response callbacks. It may cause Tags to overfill, considering that all tags should be written with every log message, it may be a problem.

Considering that Exception events should not happen too frequently, I'm fine with adding request to the payload.

stephentoub · 2017-02-23T03:25:36Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

@@ -104,6 +194,9 @@ private static void LogHttpResponse(Task<HttpResponseMessage> responseTask, Guid
                }, loggingRequestId, CancellationToken.None, TaskContinuationOptions.None, TaskScheduler.Default);
        }

+        private static readonly DiagnosticListener s_diagnosticListener =
+            new DiagnosticListener(DiagnosticsHandlerLoggingStrings.DiagnosticListenerName);


Nit: This should be at the top of the class.

stephentoub · 2017-02-23T03:28:22Z

src/System.Net.Http/tests/FunctionalTests/DiagnosticsTests.cs

+                }
+                // Poll with a timeout since logging response is not synchronized with returning a response.
+                WaitForTrue(() => cancelLogged, TimeSpan.FromSeconds(1),
+                    "Cancellation was not logged within 1 second timeout.");


This is a fairly short timeout... we've seen CI get bogged down and short timeouts like this end up causing spurious failures. Would it hurt anything to extend this to, say, 10 seconds?

This timeout is used in other tests here, which were running for quite a while and seems to be stable, at lease I cannot find any issues for test failures
Note that response is awaited, and response events are fired after response is received, so I'm not even convinced there should be any timeout here.

stephentoub · 2017-02-23T03:31:54Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                {
+                    //null Activity.Current means that incoming request was not instrumented
+                    //and there is nothing we can do for outgoing request. 
+                    //Activity events are enabled so we don't send Request/Response events


I don't fully understand the scheme employed here. Is System.Net.Http.Activity mutually exclusive with System.Net.Http.Request/Response? I'm wondering if both sets of events could/should fire for the same request, and also if System.Net.Http.Activity is enabled but Activity.Current is null, if that should preclude the Request/Response events from being fired as you have here?

I want Activity events to substitute Request/Response events eventually. Activity events are fired at the same time and provide more flexible filtering and Activity.Current for context.
So yes, Activity events are mutually exclusive with Request/Response.
There will be listeners listening to Activity requests only and there are existing listeners for Request/Response events.
If consumer is subscribed for Activity events and Activity.Current is null, it means that current operation is not instrumented, so it turns off any diagnostics for this particular request.
If consumer wants to receive Request/Response events, they should subscribe for them only.

There may be a problem if there are 2 consumers in the same process, one listening to Activity and other listening to Request/Response. In this case, second one will never receive any events.

stephentoub · 2017-02-23T03:33:11Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            {
+                activity = new Activity(DiagnosticsHandlerLoggingStrings.ActivityName);
+
+                s_diagnosticListener.StartActivity(activity, new { Request = request });


Sending events to a listener will synchronously invoke any registered handlers, right? What's the design if one of those handlers throws an exception? Will it emerge from here and prevent the http operation from going through, and if so, is that ok?

Assuming callback (for Write or IsEnabled) throws exception, http request will not be processed and exception will be caught (or not) in user code.

It seems it's a common approach (here, in SqlClient and ASP.NET Core), not to handle exceptions from DiagnosticListener callbacks.

It is certainly the case that callers of DiangosticSource.Write are not intended to protect themselves from thrown exceptions. It is the responsibility of subscribers to avoid throwing exceptions.

cwe1ss

Thanks for mentioning me here. I'm back from my 2 month vacation 😎 so I finally have more time for this.

As I've already written in another post, I have quite strong feelings against using the "Request-Id" and "Correlation-Context" headers (and their format) without any hook to change it.

IMO supporting a platform-independent microservices system is very important. By forcing these header names in the default HTTP handler you make it very hard for 3rd party diagnostics systems to do multi-platform (e.g. Java, .NET, ...) tracing.

Unfortunately I think that due to this I still can't use any of the new Activity stuff in my OpenTracing project and I doubt that any existing distributed tracer can do so (Application Insights is not yet a distributed tracer). I still have to use my own HTTP handler (to allow tracers to write their own headers) and I probably still have to use my own ASP.NET Core middleware (to allow tracers to read their own headers) even if aspnet/Hosting#939 gets implemented.

In other words, I'm still missing the inject/extract concepts from OpenTracing. I believe that this will become a proprietary implementation for Application Insights only if you don't support hooks for that.

cwe1ss · 2017-02-23T07:22:21Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                //So we let user log it as exception happened in this 'parent' activity
+                if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ExceptionEventName))
+                {
+                    s_diagnosticListener.Write(DiagnosticsHandlerLoggingStrings.ExceptionEventName, new {Exception = ex});


How about also including the "request" variable in the event properties? Would allow subscribers to log the requested URL with the exception etc.

cwe1ss · 2017-02-23T07:24:27Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                {
+                    s_diagnosticListener.StopActivity(activity, new
+                    {
+                        Response = responseTask.Status == TaskStatus.RanToCompletion ? responseTask.Result : null,


"Request" and a possible "Exception" might be interesting here as well. This would allow subscribers to subscribe just to the Stop-Event and get all scenarios/information.

Request could be accesses from Response.RequestMessage
We discussed exceptions in #15971: We expect some subscribers to be interested in Exception only and the common approach is to fire Exception events separately.
So, subscriber needs to listen to both Exception AND Activity. Note: there are no IsEnabled calls for Start and Stop, there is one for "System.Net.Http.Activity"

cwe1ss · 2017-02-23T09:15:06Z

I've just seen that the changes are not in the default handler but in a separate "DiagnosticsHandler" class. Is this handler added to the pipeline automatically? Is there a way to opt-out of this handler?

lmolkova · 2017-02-23T19:33:30Z

@cwe1ss
'DiagnosticsHandler' is called if there is a subscriber to HttpHandlerDiagnosticListener. It was done in PR #15971 and #16385.

It's off by default and you can opt-in by subscribing to HttpHandlerDiagnosticListener.
Depending on the predicate provided for subscription, DiagnoticsHandler sends either Activity events or "System.Net.Http.Request" and "System.Net.Http.Response". You can find more details here

lmolkova · 2017-02-23T23:12:51Z

Thanks for mentioning me here. I'm back from my 2 month vacation 😎 so I finally have more time for this.

As I've already written in another post, I have quite strong feelings against using the "Request-Id" and "Correlation-Context" headers (and their format) without any hook to change it.

IMO supporting a platform-independent microservices system is very important. By forcing these header names in the default HTTP handler you make it very hard for 3rd party diagnostics systems to do multi-platform (e.g. Java, .NET, ...) tracing.

Unfortunately I think that due to this I still can't use any of the new Activity stuff in my OpenTracing project and I doubt that any existing distributed tracer can do so (Application Insights is not yet a distributed tracer). I still have to use my own HTTP handler (to allow tracers to write their own headers) and I probably still have to use my own ASP.NET Core middleware (to allow tracers to read their own headers) even if aspnet/Hosting#939 gets implemented.

In other words, I'm still missing the inject/extract concepts from OpenTracing. I believe that this will become a proprietary implementation for Application Insights only if you don't support hooks for that.

We are considering to have configurable incoming requests headers in ASP.NET Core, so ASP.NET Core would support multiple schemas including and defaulting to Request-Id and Correlation-Context. We can discuss it in aspnet/Hosting#939. We would be happy to follow any existing standard, if there was one.

For the outgoing requests, it's more complicated, since different endpoints presumably expect different sets of headers. If this is the case, any instrumentation, including this one, would require some level of customization to apply headers specific to endpoint. We don't prevent this: consumers are free to disable propagation and either use custom handlers or modify request object from the event payload (which is inadvisable, but still possible). So I believe we provide hooks for that.

There may be the case when ALL downstream services expect to receive same set of headers (but not the one we propose). We may support this case later to simplify configuration, but use Request-Id and Correlation-Context by default.

vancem · 2017-02-23T23:14:35Z

@cwe1ss -on the issue on wiring in the names RequestId and CorrelationContext into the code.

We do care about interoperability and are not discounting your concern. Our basic plan is

There should be a worldwide standard for this (frankly OpenTracing SHOULD have picked one). If there HAD been an RFC or even if OpenTracing had suggested something we would have chosen it and we would not be having this conversation.
Because of the lack of standards we FULLY EXPECT to have to add logic that allows flexibility at the HTTP boundary. But this should be thought of as a 'compatibility hack' until everyone gets on the standard (you could imagine removing this code eventually because it is not useful in the presence of a strong standard).
There seems to be little value in making an extensibility mechanism so that we can 'pass the problem off' to something outside of HTTP code. After all arguably information about what goes in HTTP headers and their format something that belongs in HTTP code.
Ideally we would only have to add this compatibility flexibility ON INCOMMPING Http. The reason is that it not inefficient to check for N different synonyms for a HTTP header, but it IS inefficient to have to write the fields N different ways because you are using N different logging systems that have different conventions. Our plan is to have an environment variable that allows you to map 'foreign' names to their 'standard' one. If all logging systems do this, then no one need sot fix their OUTGOING http. We will see...

However our first order of business is to get SOMETHING up and running. Certainly by getting this minimum we enable the scenario where you only use .NET code (a nice start), and even cross-logging cases if their loggers are sufficiently flexible (they have the incoming mapping as we suggest in #4). When we get this working, we will add #4 and things 'should' work. We will see.

So we ARE interested in getting cross-logging systems working. You could be very helpful in making that actually work (and ideally we get other logging systems to be adaptable (or even come with us as we propose and adopt a standard)). We really would have RATHER simply followed someone else's lead, but given that that was not there, we want to 'fix it right' (which is with a standard). We don't have to control the standard, it just needs to exist.

stepro · 2017-02-23T21:53:44Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ActivityPropagateName))
+                {
+                    request.Headers.Add(HttpKnownHeaderNames.RequestId, activity.Id);
+


The Request-ID and Correlation-Context headers are not part of the HTTP standard and people could be using these names for arbitrary other purposes today. Instead of hard-coding the serialization of the activity, we should extract this code into a different method/class and allow the user to pick a built-in implementation (like the one you have here) or write a custom implementation.
Does a user opt into activity propagation or will it happen automagically by default? If automagically, then it seems like performing any propagation would be a breaking change (what if Request-ID is already specified in user code?) If opt-in, then it's reasonable to have a default implementation as shown here, but it should be extracted out.

Instrumentation is opt-in. To enable it, you need to listen to HttpHanlerDiagnosticListener and enable System.Net.Http.Activity event.

There is already an API for custom implementation - DelegatingHandler. Users can track outgoing HTTP requests there, if default Activity instrumentation does not fit. Custom DelegatingHandler does not need to repeat all this logic, since it can directly write to the logging system and do filtering in place.

Please also check @vancem reply:

@cwe1ss -on the issue on wiring in the names RequestId and CorrelationContext into the code.

We do care about interoperability and are not discounting your concern. Our basic plan is

There should be a worldwide standard for this (frankly OpenTracing SHOULD have picked one). If there HAD been an RFC or even if OpenTracing had suggested something we would have chosen it and we would not be having this conversation.

Because of the lack of standards we FULLY EXPECT to have to add logic that allows flexibility at the HTTP boundary. But this should be thought of as a 'compatibility hack' until everyone gets on the standard (you could imagine removing this code eventually because it is not useful in the presence of a strong standard).

There seems to be little value in making an extensibility mechanism so that we can 'pass the problem off' to something outside of HTTP code. After all arguably information about what goes in HTTP headers and their format something that belongs in HTTP code.

Ideally we would only have to add this compatibility flexibility ON INCOMMPING Http. The reason is that it not inefficient to check for N different synonyms for a HTTP header, but it IS inefficient to have to write the fields N different ways because you are using N different logging systems that have different conventions. Our plan is to have an environment variable that allows you to map 'foreign' names to their 'standard' one. If all logging systems do this, then no one need sot fix their OUTGOING http. We will see...

However our first order of business is to get SOMETHING up and running. Certainly by getting this minimum we enable the scenario where you only use .NET code (a nice start), and even cross-logging cases if their loggers are sufficiently flexible (they have the incoming mapping as we suggest in #4). When we get this working, we will add #4 and things 'should' work. We will see.

So we ARE interested in getting cross-logging systems working. You could be very helpful in making that actually work (and ideally we get other logging systems to be adaptable (or even come with us as we propose and adopt a standard)). We really would have RATHER simply followed someone else's lead, but given that that was not there, we want to 'fix it right' (which is with a standard). We don't have to control the standard, it just needs to exist.

vancem · 2017-02-23T23:58:48Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+
+                //inject headers unless user does not want it to happen
+                //we may pass request to IsEnabled, but not clear how useful it is
+                if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ActivityPropagateName))


My bias is to NOT have the complexity of a Activity.Propagate check. My concern is that it is just one more thing that loggers have to do for the system to work as expected (and it is a bit less efficient, and the code is a bit more complex).

The rationale here is that putting users have control over propagation when things come into a machine, do they really ALSO need control when it leaves?

My preference is to remove this and if we find the need for this control we can add a 'NoPropagate' check at that time (my guess is we never will) and the system will end up simpler. Even if we do end up adding it, least that way the default will be good (by default you get propagation and you have to go out of your way to do something surprising).

vancem · 2017-02-24T00:25:27Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            else 
+            {
+                Guid loggingRequestId = Guid.NewGuid();
+                LogHttpRequest(request, loggingRequestId);


As you note, these original logging message we would like to deprecate. At the very least, please rename the LogHttpRequest, to 'LogHttpRequestDeprecated' and LogHttpReponse to LogHttpResponseDeprecated. and similarly add a deprecated suffix to the RequestWriteName and ResponseWriteName symbols.

We believe the new functionality is a superset because activityies have both a ID and a beginning and ending timestamp.

I will start a separate discussion about deprecating DiagnosticSource events.

stepro · 2017-02-24T01:09:01Z

One thing I'm not seeing in this change is a good separation between the general need for header propagation (on receiving a request, de-serializing into an Activity and then on an outgoing request, serializing back into headers) and a domain-specific usage of this capability in diagnostics.

Not all headers that need propagation relate to diagnostics, for instance, I'm using linkerd which requires a certain set of headers (l5d-ctx-*) to be propagated for service routing purposes. Or maybe I'm using a particular feature flag framework that expects a particular header provided during ingress that should also be propagated.

Something else interesting. Maybe I've created a service that has zero interest in participating in domain-specific activities like diagnostics or doesn't reference any feature flag framework, but it should still propagate the headers needed for its downstream services to perform those activities. This indicates that there's a general need to propagate independent of domain.

Because of this, I'd defined an approach whereby an agreed upon header, say "Context-Headers", lists all the other headers that should be propagated, alleviating a given runtime/service from implementing domain-specific propagation routines. This is very flexible because it accommodates arbitrary existing HTTP headers rather than requiring everyone to latch on to putting contextual information inside the value of a single well-known HTTP header like Correlation-Context.

I recognize a lot of thought and time has gone into the choice of Request-ID and Correlation-Context as the headers involved here, but I wanted to put my thoughts out there because these headers don't work for my scenarios today (and won't work unless the services/frameworks I use ultimately end up supporting those headers which is not likely for quite some time).

As for creating a standard, I appreciate the desire and hope we can make progress there. I remain concerned (see my previous comment) about introducing this as an auto-magic default before it is even in the works to become a standard and may be considered a breaking change for people already using headers with those names for their own purposes.

If we do pursue a standard here, I feel like we'd be better off if it could more readily accommodate the current state of the world using a mechanism like I describe, allowing it to integrate with existing headers. I'm reminded of how CORS added the Access-Control-Expose-Headers header to list headers that should be propagated across domain boundaries, instead of requiring that special, new headers be used to contain information that could cross domains.

vancem · 2017-02-24T21:52:03Z

@stepro
There is no question that the fact we are wiring in logging-specific logic (e.g. names of logging fields and how they propagate information to a Activity) into the HTTP logic is annoying from a dependency / extensibility perspective. It is not a nice problem because there really needs to be a piece of code that knows about logging (e.g. Activities) AND HTTP (how to set http headers). We have chosen to make HTTP to know about logging since logging is a 'everywhere' service (any code can depend on logging APIs).

As you point out there may be other scenarios that would like to propagate information in a similar way, but generalizing the problem actually makes the design problems worse. When we are talking about just logging, it was OK to take a direct dependency on logging APIs. However for just about anything else you need an abstraction that hides this, which is tricky to design correctly without understanding your scenarios completely. That abstraction in turn just 'passes the buck' to the code that actually does the work, and it is not at all clear where that code lives and how it gets configured.

Your idea of making the HTTP request be able to specify some generic semantics (e.g. what should be passed on to subordinate requests) is an interesting way to attack the problem, but it does not form a complete solution (at least for logging), because you still need to define WHERE to propagate the data to. I could believe we might be able to fix this, but you must admit that without a fair bit of prototyping and feedback from the real user cases we are not likely to have gotten it correct.

Thus we are reluctant to generalize the problem. We feel that logging is sufficiently valuable to pretty much ALL customers that it can justify 'special casing' it right now. If later we prototype and implement a more generalized solution we can move to that transparently, but our concern right now is to get the logging working.

In short, sometimes it is better to do something simple and direct and fix it later than it is to try to come up with the generalized solution right off the bat. We have had a very poor track record of 'nailing a design the first time', so we are concentrating on simply getting the really high value case (of logging) done in a straightforward way. Any design more complex than the simple way, needs to really justify that it WILL be valuable (which is hard when the scenarios require a bunch of things to all work together).

stepro · 2017-02-25T06:39:44Z

@vancem - digesting your response. I think what's bugging me is that there are two pieces to this logging feature – activity-based logging and context propagation – that are tied together when there’s a lot of value in separating the concerns, and doing so would only add minimal complexity but gain a bunch of flexibility. Some scenarios:

I might want to use activity-based logging because AI just works with it, but no context propagation as defined because I’m calling some external service that doesn’t understand those specific headers, or worse, does understand them and interprets them in the wrong way (“Request-ID” is hardly a header someone else won’t have already used for who knows what).
I might not be using activity-based logging for my micro-service yet because I currently have some other logging solution in place, but I know that some upstream and downstream services have started to employ it and their logs need to be correlated, requiring context propagation as defined.
I might want to switch to activity-based logging to get some new features in AI but because I can’t update all my micro-services at the same time (actually, some of them I don’t even own – some other team does), I need to keep in place the current “Correlation-ID” header and propagation mechanism. I’m happy to implement a custom headers<->Activity translator to get this working so my service continues to fit into the overall scheme of things while still moving it forward.

I have no problem with a piece of code that knows how to create an activity with a HTTP request added as baggage. I also have no problem with a piece of code that knows about an activity and HTTP headers. But I don’t like a piece of code that knows about logging and HTTP headers (and especially one I can’t change). It’s fine if there is a default wire-up mechanism that registers two built-in pieces of code that a) manages an activity around a child request and b) propagates headers, but I’d at least like the second piece to be overridable (or removable).

Perhaps more concretely, I’m asking that there be two DelegatingHandlers in System.Net.Http – the current DiagnosticsHandler without the propagation code, and a new DelegatingHandler like PropagationHandler that serializes some of the current context into headers. When constructing a handler chain, the code might look like:

_winHttpHandler = new WinHttpHandler();
_propagationHandler = new PropagationHandler(_winHttpHandler);
_diagnosticsPipeline = new DiagnosticsHandler(_propagationHandler);

I’d then have the PropagationHandler let you override what it means to convert current context into HTTP headers via whatever injection method makes sense (static delegate, event? I'm not an expert at such things). The default would be what's described in this pull request, i.e. read the ID and baggage from the current Activity and serialize as Request-ID and Correlation-Context.

Do your reasons for not generalizing still apply given this kind of change? I don't think this is generalizing the problem as much as you might have thought I was suggesting.

vancem · 2017-02-27T20:23:41Z

@stepro First, your feedback is interesting. If we really can find a way to generalize the code with minimal cost, we are certainly open to it.

However I think we need more details of the design. Doing the segregations you describe is actually problematic, because we are using DiagnosticSources filtering hook to TELL US whether to propagate things. Thus propagation and logging are coupled. Now you can try to decouple them, but only at the cost of creating an extensibility mechanism for propagation. We could do that too, but it is unclear how to do that in a way that is reasonably flexible (that is looks like a real extensibility mechanism).

This kind of feature creep is what we are leery of. Maybe you can propose something that would allay our fears, but we would need more details (basically a description of what is public, and some sample code for using it to inject functionality (e.g. what DiagnosticSource needed) .

Our counter-claim, is that we have NOT precluded adding something like this in the future. Thus, yes, we have wired in code, but it is not much code (< 1page), and if we build the extension mechanism we can then put that code in the 'default' place with relative ease. So why not keep it super-simple now and work on the two problems without coupling? Sure the current code may not be factored the way the new code needs to be, but it is 1 page of code, how hard would fixing it be? As long as things are private, we have a lot of flexibility to fix things later.

So trying to tackle your other scenarios is reasonable, it is just that we want a particular scenario (uniform logging following a particular convention which we hope to standardize world wide) up and running ASAP.

stepro · 2017-02-27T21:01:15Z

Doing the segregations you describe is actually problematic, because we are using DiagnosticSources filtering hook to TELL US whether to propagate things

I'm actually okay with this coupling when it's one of many possible implementations of a propagation handler. That way, if someone doesn't like the coupling, they just choose a different implementation of propagation. They'll still get the diagnostics handler goodness, but now they own propagation. They might choose to continue having the diagnostic source be the trigger for propagation, or they might decide that there's more involved in propagation than just logging and decouple the two concepts.

I think I remember that Liudmila originally had a separate diagnostics event you could listen to that would enable or disable just the propagation piece, but that was removed as part of the code review. That would have helped somewhat but it didn't give a place that I could inject a different implementation of propagation, so it wasn't a complete story.

Overall, I'm fine with the change if we understand what a compatible path to this kind of separation would look like in the future (I'm definitely not an expert in .NET fx compatibility demands!)

P.S: The first problem I described, where sending "Request-ID" and/or "Correlation-Context" headers to a service that would do bad things given those headers: looking at the code more, I think this can be resolved if a diagnostic listener returns false for IsEnabled given a particular request object? Where/how would a user be able to plug in code that would give that behavior (given that it looks like if any listener returns IsEnabled=true, then the overall value is true). Would there be injection points in, say, an AI library that is listening to diagnostics to override IsEnabled for specific domains? This was probably my biggest concern given the generic (and rather obvious) naming of the headers.

vancem · 2017-02-27T21:23:31Z

@stepro

P.S: The first problem I described, where sending "Request-ID" and/or "Correlation-Context" headers to a service that would do bad things given those headers

This is basically a problem that the current code has. The basic assumption is that HTTP headers don't collide (thus having more of them is never a negative (except maybe perf)). If that collision happens, and there is any activity logging from any source then you will have a problem. (The receiver would also have to be inflexible).

Note that this conceptual problem happens ANY time you make a new HTTP header. Even with common names, it would take quite a bit of bad luck to have it be a real problem. Once the standard takes hold, it won't be a problem at all.

vancem · 2017-02-27T22:44:44Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandlerLoggingStrings.cs

+
+        public const string ExceptionEventName = "System.Net.Http.Exception";
+        public const string ActivityName = "System.Net.Http.Activity";
+        public const string ActivityPropagateName = "System.Net.Http.Activity.Propagate";


You can remove this definition as we are not using it anymore

…enarios

vancem · 2017-03-01T18:40:55Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            Guid loggingRequestId = Guid.Empty;
+
+            //cache IsEnabled result for Activity name
+            if (!s_activityEventIsChecked)


I would strongly prefer to not have static caching of enabled state. They are subtle bug farms, especially if there is contract in between (like there is here). What if one subscriber just wanted exceptions, and then a bit later, another subscriber joins and wants Activities. As the code is written the second subscriber does not get his events (because of cached enabled state when the first subscriber enabled the diagnosticSource).

I have sent you a counter-proposal in E-mail.

… always propagate if there is any activity

lmolkova · 2017-03-01T22:20:55Z

We are finished changes in this PR together with @vancem.
Could you please check the final version and let us know if you have any concerns or questions.

I'd like to highlight compatibility with the 'old' "System.Net.Http.Request" and "Response" events:

Existing consumers for "Request" and "Response" events may still use them and should consider migration to Activity events. We will announce deprecation separately
If there is at least one consumer for new "System.Net.Http.Activity" events ALL consumers will receive Activity events only.
Consumers may subscribe to "System.Net.Http.Exception". The event intentionally does not have loggingRequestId, so it is not possible to map it to the 'old' Request and Response events.

karolz-ms · 2017-03-01T22:56:42Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                //always stop activity if it was started
+                if (activity != null)
+                {
+                    activity.SetEndTime(DateTime.UtcNow);


Is the 16ms resolution of UtcNow a concern here?

CC: @pharring who just looked into this issue in AI SDK

UtcNow is precise now: dotnet/coreclr#9736

Fantastic, thx. Ship it!

As already said in #16491, I don't like that users have to call SetEndTime but not SetStartTime - especially because Stop uses DateTime.UtcNow anyway. This is completely unexpected IMO and most people who will use the Activity API will get this wrong.

Why not fix the Activity API (the StopActivity part) in a way that uses DateTime.UtcNow for Start and Stop by default and also sends the duration to the DiagnosticSource listeners?

I agree with you point, let's discuss it in #16491 and fix in another PR.

DateTime.UtcNow is more precise, but it's still not as precise as the system stopwatch. In ASP.NET Core we use DateTime.UtcNow to log the start time of the request, but we use a timestamp from Stopwatch (QPC) to compute durations.

@pharring - I suspect ASP.NET used both UtcNow and Stopwatch because up until about a week ago UtcNow was NOT very precise (16 msec). It is now upgraded to about 1usec, which should be good enough for HTTP request and response durations. That should be enough.

vancem · 2017-03-02T22:33:08Z

OK we are on the cusp of committing the merge. The upshot of this change is that is only affects logging (we can be confident of that because the DiagnosticHander was the only code changed and that only gets called when logging is on. This makes the change relative safe (only profilers/debuggers could possibly be broken). We believe that we are compatible, and only added new functionality.

We understand that we may want to generalize the code at some point if the future, but we don't want to block on that. In general anything that can be fixed compatibly is probably OK at this point.

If you have any concerns ABOUT THIS PR that rise above that, speak them now, as we are going to merge tomorrow.

cwe1ss · 2017-03-03T17:42:41Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                        do
+                        {
+                            KeyValuePair<string, string> item = e.Current;
+                            baggage.Add(new NameValueHeaderValue(item.Key, item.Value).ToString());


Does this require some encoding? What if a baggage value contains a ","? Are there any other characters that are not allowed as header values?

This is a good point. We are trying to keep the efficiency high and the complexity low, and frankly having arbitrary strings in Baggage is not really an interesting scenario (they are control values, not arbitrary strings). This argues that we push the problem to the clients of Baggage (either fail if you try to set what we will define as illegal baggage values (and possibly give them an API that does the escaping). Thus the change will be in System.Diagnostics.Activity.

I took a look at NameValueHeaderValue (I didn't know about this class before) and it seems like it is checked there. It will throw an exception if I understood it correctly.

…activity_part2 Http request instrumentation with Activity (correlation part2) Commit migrated from dotnet/corefx@896d790

Add Activity instrumentation

76dabc6

dnfclas added the cla-already-signed label Feb 23, 2017

davidsh added the area-System.Net.Http label Feb 23, 2017

davidsh requested a review from stephentoub February 23, 2017 00:48

stephentoub reviewed Feb 23, 2017

View reviewed changes

cwe1ss reviewed Feb 23, 2017

View reviewed changes

karelz assigned lmolkova, davidsh, Priya91, vancem, CIPop and geoffkizer Feb 23, 2017

review comments

707fa0d

stepro reviewed Feb 23, 2017

View reviewed changes

vancem reviewed Feb 23, 2017

View reviewed changes

return TryGetHeaderName back

8dcc776

vancem reviewed Feb 24, 2017

View reviewed changes

Liudmila Molkova added 2 commits February 23, 2017 17:15

mark Request/Respose events as deprecated, remove Propagate check

573530c

remove Request-Id and Correlation-Context from HttpKnownHeaderNames

dcf6409

vancem reviewed Feb 27, 2017

View reviewed changes

Liudmila Molkova added 2 commits February 27, 2017 14:51

remove propagate event

0916191

refactor DiagnosticsListener: enable Exceptions only and Stop only sc…

9815a34

…enarios

vancem reviewed Mar 1, 2017

View reviewed changes

Liudmila Molkova added 2 commits March 1, 2017 13:38

enabling Activity for outgoing request if there is no parent activiy,…

1ee9ac6

… always propagate if there is any activity

call SetEndTime before stopping activity

0706536

karolz-ms reviewed Mar 1, 2017

View reviewed changes

cwe1ss reviewed Mar 3, 2017

View reviewed changes

vancem merged commit 896d790 into dotnet:master Mar 3, 2017

karelz modified the milestone: 2.0.0 Mar 7, 2017

lmolkova mentioned this pull request Mar 15, 2017

Instrument HttpClientHandler on netfx with DiagnosticSource/Activity #17141

Closed

lmolkova mentioned this pull request Apr 27, 2017

Add HTTP Correlation protocol #18887

Merged

This was referenced Jan 31, 2020

Backporting HttpClient instrumentation with DiagnosticSource to netstandard1.3 dotnet/runtime#20793

Closed

HttpClient Diagnostics: deprecated events issue dotnet/runtime#21273

Closed

Http request instrumentation with Activity (correlation part2) #16393

Http request instrumentation with Activity (correlation part2) #16393

Conversation

lmolkova commented Feb 23, 2017

lmolkova commented Feb 23, 2017

Choose a reason for hiding this comment

lmolkova Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwe1ss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

cwe1ss commented Feb 23, 2017

lmolkova commented Feb 23, 2017

lmolkova commented Feb 23, 2017 • edited Loading

vancem commented Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stepro commented Feb 24, 2017 • edited Loading

vancem commented Feb 24, 2017

stepro commented Feb 25, 2017

vancem commented Feb 27, 2017

stepro commented Feb 27, 2017 • edited Loading

vancem commented Feb 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova commented Mar 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Mar 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vancem commented Mar 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Feb 23, 2017 •

edited

Loading

stephentoub Feb 23, 2017 •

edited

Loading

lmolkova Feb 23, 2017 •

edited

Loading

lmolkova Feb 23, 2017 •

edited

Loading

lmolkova commented Feb 23, 2017 •

edited

Loading

vancem commented Feb 23, 2017 •

edited

Loading

stepro commented Feb 24, 2017 •

edited

Loading

stepro commented Feb 27, 2017 •

edited

Loading

lmolkova Mar 1, 2017 •

edited

Loading