Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Refactor DiagnosticSource logging (correlation part1) #15971

Merged

Conversation

lmolkova
Copy link

@lmolkova lmolkova commented Feb 9, 2017

This change moves DiagnosticSource logging to DiagnosticsHandler and fixes #13172.

It prepares HttpClient for correlation feature support, which will allow to store global context for the current operation and propagate it in HTTP Requests.

Correlation-feature background:

/cc @vancem @avanderhoorn @nbilling @cwe1ss @karolz-ms @SergeyKanzhelev @brahmnes @davidfowl

@lmolkova
Copy link
Author

lmolkova commented Feb 9, 2017

Sketch of changes to be implemented in part 2:

   protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancel) 
   {
       Guid loggingRequestId = LogHttpRequest(request);
       Task<HttpResponseMessage> responseTask;
     
       if (!httpListener.IsEnabled("System.Net.Http.Activity.Start"))
          responseTask = base.SendAsync(request, cancel)
       else 
          responseTask = SendAsyncWithLogging(request, cancel);
     
       LogHttpResponse(responseTask, loggingRequestId);
       return responseTask;
   }
   
   private async Task<HttpResponseMessage> SendAsyncWithLogging(HttpRequestMessage request, CancellationToken cancel)
   {
       var activity = new Activity("System.Net.Http.Activity");
       if (httpListener.IsEnabled("System.Net.Http.Activity.Start", request))
       {  
           httpListener.StartActivity(activity, new {Request});
           if (httpListener.IsEnabled("System.Net.Http.Activity.Propagate"))
           {
               request.Headers.Add(RequestIdHeaderName, activity.Id);
               request.Headers.Add(CorrelationContextHeaderName, activity.Baggage);
           }
        }
        try
        {
            var response = await base.SendAsync(request, cancel);
        }
        catch(Exception ex){...}
        finally
        {
             httpListener.StopActivity(activity, new {Response, Exception});
        }
    }

Existing System.Net.Http.Request and Response will be kept and may eventually be deprecated and substituted with System.Net.Http.Activity.* events

@@ -160,14 +161,15 @@ protected override void Dispose(bool disposing)

protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
return _curlHandler.SendAsync(request, cancellationToken);
return _curlHandlerPipeline.SendAsync(request, cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid all checks for whether logging is currently enabled except for one upfront, e.g. make this line something like:

return s_diagnosticListener.IsEnabled() ?
    _curlHandlerPipeline.SendAsync(request, cancellationToken) :
    _curlHandler.SendAsync(request, cancellationToken);

such that we only have the one additional static field read / bool check upfront, and then if logging wasn't enabled, we don't have to incur any of the additional costs, such as extra branches, extra function calls, extra virtual calls, etc. We then don't have to worry quite as much about what happens in the DiagnosticHandler in subsequent PRs with regards to perf overheads.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that DiagnosticsHandler should own the logic to define if it's enabled or not at cost of one non-virtual static function
In the part 2, it will start to send other events, so having this logic spread across all handlers would be quite hard to maintain.

{
//do not write to diagnostic source if request is invalid or cancelled,
//let inner handler decide what to do with it
if (request == null || cancellationToken.IsCancellationRequested)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these checks here? Seems like if we're really leaving it up to the implementation what to do with these, the implementation could very well choose to execute a request (e.g. ignore cancellation), in which case it seems you'd still want to do pre/post logging of whatever you can. I'd suggest just removing these checks and just dealing below with the possibility that request is null in LogHttpRequest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently HttpClient checks request in SendAsync before calling HttpMessageHandler, so request never could be null here anyway.


private static Guid LogHttpRequest(HttpRequestMessage request)
{
return s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.RequestWriteName) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this more detailed check in both LogHttpRequest and LogHttpResponse, how about just doing it once in SendAsync and only calling LogHttpRequest and LogHttpResponse if it was set.

LogHttpResponseCore(t.IsFaulted ? null : t.Result, t.Exception, loggingRequestId);
}
},
TaskScheduler.Default);
Copy link
Member

@stephentoub stephentoub Feb 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • Why are we not logging a response if the operation was canceled? Seems like that'd still be interesting to know, as otherwise someone looking at a log doesn't know whether the operation is still in flight or whether it was canceled.
  • If the intent really is though to not log anything if it's canceled, that's more efficiently achieved by passing TaskContinuationOptions.NotOnCanceled to the ContinueWith call.
  • The lambda is also closing over loggingRequestId, which allocates both a closure and a delegate to a method on that closure object. It'd be more efficient to pass the needed state in through the continuation's object state; since the state in this case is a Guid, it'll still be boxed, but that's better than the two larger allocations for the delegate and closure. Something like:
responseTask.ContinueWith((t,s) => LogHttpResponseCore(t.IsFaulted ? null : t.Result, t.Exception, (Guid)s),
    loggingRequestId, CancellationToken.None, TaskContinuationOptions.NotOnCanceled, TaskScheduler.Default);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that logging cancelled requests is a good idea. It also requires passing TaskStatus to Write event, so user's won't need to guess what happened to the request

@vancem
Copy link
Contributor

vancem commented Feb 9, 2017

LGTM

@@ -160,6 +161,10 @@ protected override void Dispose(bool disposing)

protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
if (DiagnosticsHandler.IsEnabled())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up calling:

s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.RequestWriteName); 

How much slower is that than just:

s_diagnosticListener.IsEnabled();

Basically, I'm looking for the fastest possible way to know "diagnostics are definitely not enabled": it's ok if we report that diagnostics are enabled even if this particular listener may not be, but it's not ok to report that diagnostics are not enabled even if they may be. That then ensures that if no diagnostics are enabled, we as quickly as possible can take the fast path of going directly to CurlHandler (or WinHttpHandler), but if any diagnostics are enabled, then we can take the slightly slower (hopefully not too much slower) path of going through the extra checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiagnosticSource does not have a no-argument IsEnabled() API.

However what is true is that if there are NO subscribers to the Listener, IsEnabled(string) returns very quickly (it checks a field for null and returns false).

It is true that we could add a no-argument IsEnabled() which would avoid the need to pass the string, the cost of passing is pretty small (Basically a memory fetch).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Activity support, this method will be twice less-efficient.

s_diagnosticListener.IsEnabled(RequestWriteName) || s_diagnosticListener.IsEnabled(ActivityWriteName); 

So having no-argument IsEnabled() seems like a good idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiagnosticSource does not have a no-argument IsEnabled() API.

I see. I just assumed it had a similar scheme as EventSource, with an IsEnabled() that was broad and then an overload that was more specific to a particular set of events.

Copy link
Member

@stephentoub stephentoub Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However what is true is that if there are NO subscribers to the Listener, IsEnabled(string) returns very quickly (it checks a field for null and returns false).

@vancem, I was just looking at some ASP.NET traces, and IsEnabled(string) is actually showing up. It's a virtual call, so it doesn't get inlined and involves virtual dispatch, and then even if it wasn't it's got enough IL to not be considered for inlining. Seems like something that should be addressed.

cc: @benaadams, as the traces came from him (thanks, Ben)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventSource works well where it essentially evaporates if its not enabled (using the check pattern), but DiagnosticSource always pays the virtual dispatch cost event with an Enabled test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add IsEnabled() as non-virtual call to DiagnosticsListener and have it return diagnosticSubscription != null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add IsEnabled()

Yup, that's what I suggested offline to Vance:

public bool IsEnabled() => _subscriptions != null;

else
{
response = await this.handlerToFilter.SendAsync(request, cancellationToken).ConfigureAwait(false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient in the resulting IL to do this as a single await rather than two:

Task<HttpResponseMessage> responseTask = DiagnosticsHandler.IsEnabled() ?
    this.diagnosticsPipeline.SendAsync(...) :
    this.handlerToFilter.SendAsync(...);
response = await responseTask.ConfigureAwait(false);

CancellationToken cancellationToken)
{
//HttpClientHandler is responsible to call DelegatingHandler.IsEnabled() before forwarding request to it.
//so this code will never be reached if RequestWriteName is not enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically that's not 100% true: it could have been disabled between the time we checked and here. But that's true even if we were checking here, i.e. there will always be the potential for a race condition.

The code you have is fine... I'm just nit picking at the comment.

Guid loggingRequestId = LogHttpRequest(request);
Task<HttpResponseMessage> responseTask = base.SendAsync(request, cancellationToken);

if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ResponseWriteName))
Copy link
Member

@stephentoub stephentoub Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this check? The synchronous call to SendAsync should be very fast; it's unlikely this will have changed since we previously checked, and in the worst case and it has, we just pay a little expense we didn't have to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are actually two different strings being comared. The first with a 'Request' suffix and the latter with a 'Response' suffix. Since we are about to write the Response we do the check that that event is enabled. It is the expected cliché (always surround the write with an IsEnabled check).

One could argue that the ability to turn off 'Response' selectively even if 'Request' may not be valuable. But since we are not in the most critical path (we have already committed to doing some logging).

In thinking about it however, I think it is better to change it so that the 'master' switch is the Response rather than the Request. (and put the if check in the request part). That is because if you want to be lean, you only have one event you want it to be the response (since it has the most information, and in particular the duration). Then you can decide if you want both Request as well as the Response.

new
{
Response = t.Status == TaskStatus.RanToCompletion ? t.Result : null,
LoggingRequestId = s,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoggingRequestId will now be typed as Object instead of as Guid. Do you want to change this to LoggingRequestId = (Guid)s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we want it to be a GUID.

@stephentoub stephentoub merged commit de3b53d into dotnet:master Feb 10, 2017

internal static bool IsEnabled()
{
return s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ResponseWriteName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here is someone is listening to the Request event but not the Response?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to your question, is that you if you don't turn on the 'Response' event, than you can't get the Request event either. (You can however do it the other way around you can get the Response without the Request).

The goal was to be as lean as possible when logging is turned off. Thus 'Response' was chosen as the 'master' switch, that turns off both requests and responses. We chose Reponse to be the master because it has the most information available (e.g. whether it succeeded or even the duration etc).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our assumption is that Response is the most interesting event, so there may be scenarios when user is interested in Response only or in both: Request and Response.
All subscribers I'm aware of, including Intellitrace, subscribe for both events.
Do you have any particular scenario for pure Request event?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that and its what I assumed, but I'm just wondering how obvious that is to a user? Like if for what ever reason I'm a user and I think I only want to know about the request and hence thats the event I subscribe to, what happens? If nothing happens, how do I find out that I should have also subscribed to the response event to get things to work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you point.
It seems like users just have to look to this implementation to find DiagnosticSource name, events and behavior.

@nbilling Are you aware of any public documentation for HttpClient DiagnosticSource instrumentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that when something like this goes wrong, we are expecting users to read the source vs it working the way they expect, doesn't seem like a great answer... I get that for perf reasons it was done the way it was done, but is the perf again so great that we aren't going with the path that drops people into the pit of success?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be clear that the number of people using DiagnosticSource is limited (probably less than 100 worldwide). Moreover it is used in the most perf-critical parts of the system. I think it is reasonable to bias things toward perf at the expense of usability, especially if that usability can be fixed with some reasonable docs. In the not too-distant furture (months) we shodul have having the and source server set up so that people can frankly just debug into things if they run into a problem with a very good experience. That is probably BETTER than docs because it will always be accurate, and clear.

There is a pretty good chance that we will use this trick other places (basically form 'groups' of events where one is considered the 'master' of the others.

@lmolkova can you add to the description of the 'IsEnabled' callback the fact that some events (typically a stop event) might used as the 'gate' for other events (e.g. the cooresponding start event).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to use no-arg IsEnabled() as a 'gate' for all events, it will address initial concern about pure Request events and would allow subscribing to Request/Response independently.
However users still would need to check this implementation to know which events are fired and their payload properties

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy as long as its documented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avanderhoorn I'm working on Activity (new class in System.Diagnostics.DiagnosticSource) support in HttpClient.
As a part of this work, two new events will be fired: System.Net.Http.Activity.Start and Stop.
They will have other payloads, so we cannot reuse existing Request/Response events.

In the Write callback, users will receive request (response) objects and they also may access Activity.Current as a diagnostics context: Activity carries Id, start/stop time and some other properties.

Once new events are implemented, I suggest to deprecate Request/Response and let users migrate to Activity events instead, since they (along with Activity.Current context) will be more descriptive.

I'd be happy to document new events (e.g. as a .md file in this repo) and recommend to use them.
Would you be ok with this?

TimeStamp = timestamp,
Exception = t.Exception,
RequestTaskStatus = t.Status
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13172 looks to "Send failed HTTP requests to DiagnosticSource"... does one determine an exception case by looking to see if .Exception is null or not? Not that this has to follow what we do else where but most other events that I can think of have a separate event for exceptions and try not to "overload" the exception case into the end event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vancem @lmolkova any thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a fair question, and currently I don't have a strong opinion. In such situation I typically take my cue from the code. What convention makes the code cleaner (both for creation and consumption). Certainly from a start-stop perspective, it is GOOD if every start has a stop and the success/failure of the activity does not change the event used as the stop.

On the other hand, often errors are much more interesting than success cases, so it seem very GOOD if errors fired their own events (and thus you can easily filter to just them). However this might be taken care of simply by having an event for ANY exceptions (which cover errors regardless of component), so you don't need special error events in most cases.

Whatever we do, ideally we would do it uniformly (we don't have different conventions for different components.

My inclination, however is to ALWAYS have stop events (independent of error, but with an indication in the payload) like we have above, BUT we may ADDTIONALLY want an event that fires on an error (before the stop event fires, because it makes it clear that the error event is nested inside the activity).

But these are just inclinations. If this proposal has bad properties that we have not yet considered, we should certainly continue the discussion...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we thought that the exception case was more interesting for people in the exception case, which should happen more rarely than the success case, from a perf perspective, wouldn't making it a separate event be more performant - since currently people who are interested in only exceptions have to subscribe to the success case and internally check, as apposed to if there was an exception event, their paths would only be lit up on error... does that make sense?

Copy link
Author

@lmolkova lmolkova Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me summarize: we are optimizing case when user is interested in Exceptions only.
@vancem I think it's an opportunity for the new DiagnosticSource.IdEnabled(string ,object, object) overload

if (diagListener.IsEnabled("Response", exception)) 
//exception could be null and user can filter out responses without exceptions
{
    diagListener.Write("Response", new {Response, exception, ...})
}

...
diagListener.Subscribe(observer, 
    (name, exception, _) => name == "Response" && exception != null});

User who provided name-based predicate or none at all, receive all events.
It involves additional exception check. However, it keeps code clean and reduces number of events user has to be aware of.

Regarding common event for any exception: there are no conventions for additional context that may be provided with any exception and without context it may be useless. Activity.Current should solve this problem though.

@avanderhoorn
Copy link
Contributor

Sorry for the late feedback but I'm currently on paternity leave and only just saw this.

{
LogHttpRequest(request, loggingRequestId);
}
Task<HttpResponseMessage> responseTask = base.SendAsync(request, cancellationToken);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder - can SendAsync throw in case of DNS error or invalid args? Should it be in try{}catch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than for things like null inputs, it should not throw synchronously: failures for things like DNS resolution come out via the Task.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can other handler be in the pipe and throw synchronously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as currently implemented. Where do you believe such a handler could be plugged in? This handler is explicitly being created as part of the HttpClientHandler implementation and wrapped around either CurlHandler or WinHttpHandler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for explanation. If it's not possible - I'd suggest to write a comment for future reference

responseTask.ContinueWith(
(t, s) =>
{
long timestamp = Stopwatch.GetTimestamp();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what's the benefit of getting timestamp in production code, not in subscriber

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure that all subscribers get the same timestamp for correlation, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. if this is the intent. thanks for clarification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that was the original intent... that way all subscribers are able to show users the same values.


private static void LogHttpResponse(Task<HttpResponseMessage> responseTask, Guid loggingRequestId)
{
responseTask.ContinueWith(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check responseTask for null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bug in the handler if SendAsync returned null.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug in framework? Can there be another handler that introduced this error? Should tracing handle this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other handler?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You explained above that other handler can not be injected between logging and actual working handler.

DiagnosticsHandlerLoggingStrings.ResponseWriteName,
new
{
Response = t.Status == TaskStatus.RanToCompletion ? t.Result : null,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there were an exception and response is not populated - is LoggingRequestId the only field I can use to understand what was the original request?

TimeStamp = timestamp,
Exception = t.Exception,
RequestTaskStatus = t.Status
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case I haven't subscribed on Request - I still want to get request properties in this event.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If request did not throw an exception, you can access it's properties through Response.RequestMessage.

If exception was thrown (which is related to you previous question), you need to subscribe to Request event and map response event to request with LogginRequestId.
If you don't want to store requests to access them on failure, I'd suggest to wait for Acitivty support

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send failed HTTP requests to DiagnosticSource