Refactor DiagnosticSource logging (correlation part1) #15971

lmolkova · 2017-02-09T00:01:20Z

This change moves DiagnosticSource logging to DiagnosticsHandler and fixes #13172.

It prepares HttpClient for correlation feature support, which will allow to store global context for the current operation and propagate it in HTTP Requests.

Correlation-feature background:

/cc @vancem @avanderhoorn @nbilling @cwe1ss @karolz-ms @SergeyKanzhelev @brahmnes @davidfowl

lmolkova · 2017-02-09T00:01:29Z

Sketch of changes to be implemented in part 2:

   protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancel) 
   {
       Guid loggingRequestId = LogHttpRequest(request);
       Task<HttpResponseMessage> responseTask;
     
       if (!httpListener.IsEnabled("System.Net.Http.Activity.Start"))
          responseTask = base.SendAsync(request, cancel)
       else 
          responseTask = SendAsyncWithLogging(request, cancel);
     
       LogHttpResponse(responseTask, loggingRequestId);
       return responseTask;
   }
   
   private async Task<HttpResponseMessage> SendAsyncWithLogging(HttpRequestMessage request, CancellationToken cancel)
   {
       var activity = new Activity("System.Net.Http.Activity");
       if (httpListener.IsEnabled("System.Net.Http.Activity.Start", request))
       {  
           httpListener.StartActivity(activity, new {Request});
           if (httpListener.IsEnabled("System.Net.Http.Activity.Propagate"))
           {
               request.Headers.Add(RequestIdHeaderName, activity.Id);
               request.Headers.Add(CorrelationContextHeaderName, activity.Baggage);
           }
        }
        try
        {
            var response = await base.SendAsync(request, cancel);
        }
        catch(Exception ex){...}
        finally
        {
             httpListener.StopActivity(activity, new {Response, Exception});
        }
    }

Existing System.Net.Http.Request and Response will be kept and may eventually be deprecated and substituted with System.Net.Http.Activity.* events

stephentoub · 2017-02-09T03:53:08Z

src/System.Net.Http/src/System/Net/Http/HttpClientHandler.Unix.cs

@@ -160,14 +161,15 @@ protected override void Dispose(bool disposing)

        protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
-            return _curlHandler.SendAsync(request, cancellationToken);
+            return _curlHandlerPipeline.SendAsync(request, cancellationToken);


I would like to avoid all checks for whether logging is currently enabled except for one upfront, e.g. make this line something like:

return s_diagnosticListener.IsEnabled() ? _curlHandlerPipeline.SendAsync(request, cancellationToken) : _curlHandler.SendAsync(request, cancellationToken);

such that we only have the one additional static field read / bool check upfront, and then if logging wasn't enabled, we don't have to incur any of the additional costs, such as extra branches, extra function calls, extra virtual calls, etc. We then don't have to worry quite as much about what happens in the DiagnosticHandler in subsequent PRs with regards to perf overheads.

I suggest that DiagnosticsHandler should own the logic to define if it's enabled or not at cost of one non-virtual static function
In the part 2, it will start to send other events, so having this logic spread across all handlers would be quite hard to maintain.

stephentoub · 2017-02-09T03:55:21Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+        {
+            //do not write to diagnostic source if request is invalid or cancelled,
+            //let inner handler decide what to do with it
+            if (request == null || cancellationToken.IsCancellationRequested)


Why are these checks here? Seems like if we're really leaving it up to the implementation what to do with these, the implementation could very well choose to execute a request (e.g. ignore cancellation), in which case it seems you'd still want to do pre/post logging of whatever you can. I'd suggest just removing these checks and just dealing below with the possibility that request is null in LogHttpRequest.

Apparently HttpClient checks request in SendAsync before calling HttpMessageHandler, so request never could be null here anyway.

stephentoub · 2017-02-09T03:57:13Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+
+        private static Guid LogHttpRequest(HttpRequestMessage request)
+        {
+            return s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.RequestWriteName) ?


Rather than doing this more detailed check in both LogHttpRequest and LogHttpResponse, how about just doing it once in SendAsync and only calling LogHttpRequest and LogHttpResponse if it was set.

stephentoub · 2017-02-09T04:00:33Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                        LogHttpResponseCore(t.IsFaulted ? null : t.Result, t.Exception, loggingRequestId);
+                    }
+                },
+                TaskScheduler.Default);


A few things:

Why are we not logging a response if the operation was canceled? Seems like that'd still be interesting to know, as otherwise someone looking at a log doesn't know whether the operation is still in flight or whether it was canceled.

If the intent really is though to not log anything if it's canceled, that's more efficiently achieved by passing TaskContinuationOptions.NotOnCanceled to the ContinueWith call.

The lambda is also closing over loggingRequestId, which allocates both a closure and a delegate to a method on that closure object. It'd be more efficient to pass the needed state in through the continuation's object state; since the state in this case is a Guid, it'll still be boxed, but that's better than the two larger allocations for the delegate and closure. Something like:

responseTask.ContinueWith((t,s) => LogHttpResponseCore(t.IsFaulted ? null : t.Result, t.Exception, (Guid)s), loggingRequestId, CancellationToken.None, TaskContinuationOptions.NotOnCanceled, TaskScheduler.Default);

I agree that logging cancelled requests is a good idea. It also requires passing TaskStatus to Write event, so user's won't need to guess what happened to the request

vancem · 2017-02-09T23:58:46Z

LGTM

stephentoub · 2017-02-10T00:14:46Z

src/System.Net.Http/src/System/Net/Http/HttpClientHandler.Unix.cs

@@ -160,6 +161,10 @@ protected override void Dispose(bool disposing)

        protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
+            if (DiagnosticsHandler.IsEnabled())


This ends up calling:

s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.RequestWriteName);

How much slower is that than just:

s_diagnosticListener.IsEnabled();

Basically, I'm looking for the fastest possible way to know "diagnostics are definitely not enabled": it's ok if we report that diagnostics are enabled even if this particular listener may not be, but it's not ok to report that diagnostics are not enabled even if they may be. That then ensures that if no diagnostics are enabled, we as quickly as possible can take the fast path of going directly to CurlHandler (or WinHttpHandler), but if any diagnostics are enabled, then we can take the slightly slower (hopefully not too much slower) path of going through the extra checks.

DiagnosticSource does not have a no-argument IsEnabled() API.

However what is true is that if there are NO subscribers to the Listener, IsEnabled(string) returns very quickly (it checks a field for null and returns false).

It is true that we could add a no-argument IsEnabled() which would avoid the need to pass the string, the cost of passing is pretty small (Basically a memory fetch).

With Activity support, this method will be twice less-efficient.

s_diagnosticListener.IsEnabled(RequestWriteName) || s_diagnosticListener.IsEnabled(ActivityWriteName);

So having no-argument IsEnabled() seems like a good idea

DiagnosticSource does not have a no-argument IsEnabled() API.

I see. I just assumed it had a similar scheme as EventSource, with an IsEnabled() that was broad and then an overload that was more specific to a particular set of events.

However what is true is that if there are NO subscribers to the Listener, IsEnabled(string) returns very quickly (it checks a field for null and returns false).

@vancem, I was just looking at some ASP.NET traces, and IsEnabled(string) is actually showing up. It's a virtual call, so it doesn't get inlined and involves virtual dispatch, and then even if it wasn't it's got enough IL to not be considered for inlining. Seems like something that should be addressed.

cc: @benaadams, as the traces came from him (thanks, Ben)

EventSource works well where it essentially evaporates if its not enabled (using the check pattern), but DiagnosticSource always pays the virtual dispatch cost event with an Enabled test

Could add IsEnabled() as non-virtual call to DiagnosticsListener and have it return diagnosticSubscription != null?

Could add IsEnabled()

Yup, that's what I suggested offline to Vance:

public bool IsEnabled() => _subscriptions != null;

stephentoub · 2017-02-10T00:16:34Z

src/System.Net.Http/src/netcore50/System/Net/HttpClientHandler.cs

+                else
+                {
+                    response = await this.handlerToFilter.SendAsync(request, cancellationToken).ConfigureAwait(false);
+                }


It would be more efficient in the resulting IL to do this as a single await rather than two:

Task<HttpResponseMessage> responseTask = DiagnosticsHandler.IsEnabled() ? this.diagnosticsPipeline.SendAsync(...) : this.handlerToFilter.SendAsync(...); response = await responseTask.ConfigureAwait(false);

stephentoub · 2017-02-10T00:18:13Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            CancellationToken cancellationToken)
+        {
+            //HttpClientHandler is responsible to call DelegatingHandler.IsEnabled() before forwarding request to it.
+            //so this code will never be reached if RequestWriteName is not enabled


Technically that's not 100% true: it could have been disabled between the time we checked and here. But that's true even if we were checking here, i.e. there will always be the potential for a race condition.

The code you have is fine... I'm just nit picking at the comment.

stephentoub · 2017-02-10T00:18:24Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            Guid loggingRequestId = LogHttpRequest(request);
+            Task<HttpResponseMessage> responseTask = base.SendAsync(request, cancellationToken);
+
+            if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ResponseWriteName))


Do you still need this check? The synchronous call to SendAsync should be very fast; it's unlikely this will have changed since we previously checked, and in the worst case and it has, we just pay a little expense we didn't have to.

There are actually two different strings being comared. The first with a 'Request' suffix and the latter with a 'Response' suffix. Since we are about to write the Response we do the check that that event is enabled. It is the expected cliché (always surround the write with an IsEnabled check).

One could argue that the ability to turn off 'Response' selectively even if 'Request' may not be valuable. But since we are not in the most critical path (we have already committed to doing some logging).

In thinking about it however, I think it is better to change it so that the 'master' switch is the Response rather than the Request. (and put the if check in the request part). That is because if you want to be lean, you only have one event you want it to be the response (since it has the most information, and in particular the duration). Then you can decide if you want both Request as well as the Response.

stephentoub · 2017-02-10T00:19:40Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                        new
+                        {
+                            Response = t.Status == TaskStatus.RanToCompletion ? t.Result : null,
+                            LoggingRequestId = s,


LoggingRequestId will now be typed as Object instead of as Guid. Do you want to change this to LoggingRequestId = (Guid)s?

Good catch, we want it to be a GUID.

avanderhoorn · 2017-02-10T23:31:56Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+
+        internal static bool IsEnabled()
+        {
+            return s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ResponseWriteName);


What happens here is someone is listening to the Request event but not the Response?

The answer to your question, is that you if you don't turn on the 'Response' event, than you can't get the Request event either. (You can however do it the other way around you can get the Response without the Request).

The goal was to be as lean as possible when logging is turned off. Thus 'Response' was chosen as the 'master' switch, that turns off both requests and responses. We chose Reponse to be the master because it has the most information available (e.g. whether it succeeded or even the duration etc).

Our assumption is that Response is the most interesting event, so there may be scenarios when user is interested in Response only or in both: Request and Response.
All subscribers I'm aware of, including Intellitrace, subscribe for both events.
Do you have any particular scenario for pure Request event?

I get that and its what I assumed, but I'm just wondering how obvious that is to a user? Like if for what ever reason I'm a user and I think I only want to know about the request and hence thats the event I subscribe to, what happens? If nothing happens, how do I find out that I should have also subscribed to the response event to get things to work?

I agree with you point.
It seems like users just have to look to this implementation to find DiagnosticSource name, events and behavior.

@nbilling Are you aware of any public documentation for HttpClient DiagnosticSource instrumentation?

The fact that when something like this goes wrong, we are expecting users to read the source vs it working the way they expect, doesn't seem like a great answer... I get that for perf reasons it was done the way it was done, but is the perf again so great that we aren't going with the path that drops people into the pit of success?

We should be clear that the number of people using DiagnosticSource is limited (probably less than 100 worldwide). Moreover it is used in the most perf-critical parts of the system. I think it is reasonable to bias things toward perf at the expense of usability, especially if that usability can be fixed with some reasonable docs. In the not too-distant furture (months) we shodul have having the and source server set up so that people can frankly just debug into things if they run into a problem with a very good experience. That is probably BETTER than docs because it will always be accurate, and clear.

There is a pretty good chance that we will use this trick other places (basically form 'groups' of events where one is considered the 'master' of the others.

@lmolkova can you add to the description of the 'IsEnabled' callback the fact that some events (typically a stop event) might used as the 'gate' for other events (e.g. the cooresponding start event).

We agreed to use no-arg IsEnabled() as a 'gate' for all events, it will address initial concern about pure Request events and would allow subscribing to Request/Response independently.
However users still would need to check this implementation to know which events are fired and their payload properties

I'm happy as long as its documented.

@avanderhoorn I'm working on Activity (new class in System.Diagnostics.DiagnosticSource) support in HttpClient.
As a part of this work, two new events will be fired: System.Net.Http.Activity.Start and Stop.
They will have other payloads, so we cannot reuse existing Request/Response events.

In the Write callback, users will receive request (response) objects and they also may access Activity.Current as a diagnostics context: Activity carries Id, start/stop time and some other properties.

Once new events are implemented, I suggest to deprecate Request/Response and let users migrate to Activity events instead, since they (along with Activity.Current context) will be more descriptive.

I'd be happy to document new events (e.g. as a .md file in this repo) and recommend to use them.
Would you be ok with this?

avanderhoorn · 2017-02-10T23:35:24Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                            TimeStamp = timestamp,
+                            Exception = t.Exception,
+                            RequestTaskStatus = t.Status
+                        }


#13172 looks to "Send failed HTTP requests to DiagnosticSource"... does one determine an exception case by looking to see if .Exception is null or not? Not that this has to follow what we do else where but most other events that I can think of have a separate event for exceptions and try not to "overload" the exception case into the end event.

@vancem @lmolkova any thoughts on this?

It is a fair question, and currently I don't have a strong opinion. In such situation I typically take my cue from the code. What convention makes the code cleaner (both for creation and consumption). Certainly from a start-stop perspective, it is GOOD if every start has a stop and the success/failure of the activity does not change the event used as the stop.

On the other hand, often errors are much more interesting than success cases, so it seem very GOOD if errors fired their own events (and thus you can easily filter to just them). However this might be taken care of simply by having an event for ANY exceptions (which cover errors regardless of component), so you don't need special error events in most cases.

Whatever we do, ideally we would do it uniformly (we don't have different conventions for different components.

My inclination, however is to ALWAYS have stop events (independent of error, but with an indication in the payload) like we have above, BUT we may ADDTIONALLY want an event that fires on an error (before the stop event fires, because it makes it clear that the error event is nested inside the activity).

But these are just inclinations. If this proposal has bad properties that we have not yet considered, we should certainly continue the discussion...

If we thought that the exception case was more interesting for people in the exception case, which should happen more rarely than the success case, from a perf perspective, wouldn't making it a separate event be more performant - since currently people who are interested in only exceptions have to subscribe to the success case and internally check, as apposed to if there was an exception event, their paths would only be lit up on error... does that make sense?

Let me summarize: we are optimizing case when user is interested in Exceptions only.
@vancem I think it's an opportunity for the new DiagnosticSource.IdEnabled(string ,object, object) overload

if (diagListener.IsEnabled("Response", exception)) //exception could be null and user can filter out responses without exceptions { diagListener.Write("Response", new {Response, exception, ...}) } ... diagListener.Subscribe(observer, (name, exception, _) => name == "Response" && exception != null});

User who provided name-based predicate or none at all, receive all events.
It involves additional exception check. However, it keeps code clean and reduces number of events user has to be aware of.

Regarding common event for any exception: there are no conventions for additional context that may be provided with any exception and without context it may be useless. Activity.Current should solve this problem though.

avanderhoorn · 2017-02-10T23:36:04Z

Sorry for the late feedback but I'm currently on paternity leave and only just saw this.

SergeyKanzhelev · 2017-02-13T17:25:42Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            {
+                LogHttpRequest(request,  loggingRequestId);
+            }
+            Task<HttpResponseMessage> responseTask = base.SendAsync(request, cancellationToken);


I wonder - can SendAsync throw in case of DNS error or invalid args? Should it be in try{}catch?

Other than for things like null inputs, it should not throw synchronously: failures for things like DNS resolution come out via the Task.

Can other handler be in the pipe and throw synchronously?

Not as currently implemented. Where do you believe such a handler could be plugged in? This handler is explicitly being created as part of the HttpClientHandler implementation and wrapped around either CurlHandler or WinHttpHandler.

ok, thanks for explanation. If it's not possible - I'd suggest to write a comment for future reference

SergeyKanzhelev · 2017-02-13T17:27:23Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+            responseTask.ContinueWith(
+                (t, s) =>
+                {
+                    long timestamp = Stopwatch.GetTimestamp();


I'm not clear what's the benefit of getting timestamp in production code, not in subscriber

This is to ensure that all subscribers get the same timestamp for correlation, no?

ok. if this is the intent. thanks for clarification

Yep that was the original intent... that way all subscribers are able to show users the same values.

SergeyKanzhelev · 2017-02-13T17:28:12Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+
+        private static void LogHttpResponse(Task<HttpResponseMessage> responseTask, Guid loggingRequestId)
+        {
+            responseTask.ContinueWith(


check responseTask for null?

It would be a bug in the handler if SendAsync returned null.

bug in framework? Can there be another handler that introduced this error? Should tracing handle this case?

What other handler?

You explained above that other handler can not be injected between logging and actual working handler.

SergeyKanzhelev · 2017-02-13T17:33:19Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                        DiagnosticsHandlerLoggingStrings.ResponseWriteName,
+                        new
+                        {
+                            Response = t.Status == TaskStatus.RanToCompletion ? t.Result : null,


if there were an exception and response is not populated - is LoggingRequestId the only field I can use to understand what was the original request?

SergeyKanzhelev · 2017-02-13T17:38:15Z

src/System.Net.Http/src/System/Net/Http/DiagnosticsHandler.cs

+                            TimeStamp = timestamp,
+                            Exception = t.Exception,
+                            RequestTaskStatus = t.Status
+                        }


In case I haven't subscribed on Request - I still want to get request properties in this event.

If request did not throw an exception, you can access it's properties through Response.RequestMessage.

If exception was thrown (which is related to you previous question), you need to subscribe to Request event and map response event to request with LogginRequestId.
If you don't want to store requests to access them on failure, I'd suggest to wait for Acitivty support

…15971) * Refactor DiagnosticSource logging, prepare for Activity support Commit migrated from dotnet/corefx@de3b53d

Refactor DiagnosticSource logging, prepare for Activity support

a672fef

dnfclas added the cla-already-signed label Feb 9, 2017

karelz added the area-System.Diagnostics label Feb 9, 2017

karelz assigned lmolkova and vancem Feb 9, 2017

karelz added area-System.Net.Http and removed area-System.Diagnostics labels Feb 9, 2017

karelz assigned davidsh, Priya91, CIPop and geoffkizer Feb 9, 2017

davidsh requested a review from stephentoub February 9, 2017 02:25

fix uap and unix builds

7bc01f3

stephentoub reviewed Feb 9, 2017

View reviewed changes

review comments

d9f4f54

stephentoub reviewed Feb 10, 2017

View reviewed changes

review

6216583

stephentoub approved these changes Feb 10, 2017

View reviewed changes

stephentoub merged commit de3b53d into dotnet:master Feb 10, 2017

avanderhoorn reviewed Feb 10, 2017

View reviewed changes

SergeyKanzhelev reviewed Feb 13, 2017

View reviewed changes

karelz modified the milestone: 2.0.0 Feb 13, 2017

benaadams mentioned this pull request Feb 13, 2017

Start and Stop an Activity for each request aspnet/Hosting#939

Closed

This was referenced Feb 22, 2017

Refactor HTTP DiagnosticSource logging (part1 continued) #16385

Merged

Http request instrumentation with Activity (correlation part2) #16393

Merged

steveharter mentioned this pull request Jan 31, 2020

DiagnosticsTest.SendAsync_ExpectedDiagnosticCancelledLogging Assert failing area-System.Net.Http.Functional.Tests dotnet/runtime#20204

Closed

lmolkova mentioned this pull request Jan 31, 2020

DiagnosticListener performance optimization dotnet/runtime#20211

Closed

Refactor DiagnosticSource logging (correlation part1) #15971

Refactor DiagnosticSource logging (correlation part1) #15971

Conversation

lmolkova commented Feb 9, 2017

lmolkova commented Feb 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Feb 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vancem commented Feb 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Feb 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Feb 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Feb 14, 2017 • edited Loading

Choose a reason for hiding this comment

avanderhoorn commented Feb 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub Feb 9, 2017 •

edited

Loading

stephentoub Feb 13, 2017 •

edited

Loading

stephentoub Feb 10, 2017 •

edited

Loading

lmolkova Feb 14, 2017 •

edited

Loading