-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Refactor DiagnosticSource logging (correlation part1) #15971
Refactor DiagnosticSource logging (correlation part1) #15971
Conversation
Sketch of changes to be implemented in part 2: protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancel)
{
Guid loggingRequestId = LogHttpRequest(request);
Task<HttpResponseMessage> responseTask;
if (!httpListener.IsEnabled("System.Net.Http.Activity.Start"))
responseTask = base.SendAsync(request, cancel)
else
responseTask = SendAsyncWithLogging(request, cancel);
LogHttpResponse(responseTask, loggingRequestId);
return responseTask;
}
private async Task<HttpResponseMessage> SendAsyncWithLogging(HttpRequestMessage request, CancellationToken cancel)
{
var activity = new Activity("System.Net.Http.Activity");
if (httpListener.IsEnabled("System.Net.Http.Activity.Start", request))
{
httpListener.StartActivity(activity, new {Request});
if (httpListener.IsEnabled("System.Net.Http.Activity.Propagate"))
{
request.Headers.Add(RequestIdHeaderName, activity.Id);
request.Headers.Add(CorrelationContextHeaderName, activity.Baggage);
}
}
try
{
var response = await base.SendAsync(request, cancel);
}
catch(Exception ex){...}
finally
{
httpListener.StopActivity(activity, new {Response, Exception});
}
} Existing System.Net.Http.Request and Response will be kept and may eventually be deprecated and substituted with System.Net.Http.Activity.* events |
@@ -160,14 +161,15 @@ protected override void Dispose(bool disposing) | |||
|
|||
protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) | |||
{ | |||
return _curlHandler.SendAsync(request, cancellationToken); | |||
return _curlHandlerPipeline.SendAsync(request, cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid all checks for whether logging is currently enabled except for one upfront, e.g. make this line something like:
return s_diagnosticListener.IsEnabled() ?
_curlHandlerPipeline.SendAsync(request, cancellationToken) :
_curlHandler.SendAsync(request, cancellationToken);
such that we only have the one additional static field read / bool check upfront, and then if logging wasn't enabled, we don't have to incur any of the additional costs, such as extra branches, extra function calls, extra virtual calls, etc. We then don't have to worry quite as much about what happens in the DiagnosticHandler in subsequent PRs with regards to perf overheads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that DiagnosticsHandler should own the logic to define if it's enabled or not at cost of one non-virtual static function
In the part 2, it will start to send other events, so having this logic spread across all handlers would be quite hard to maintain.
{ | ||
//do not write to diagnostic source if request is invalid or cancelled, | ||
//let inner handler decide what to do with it | ||
if (request == null || cancellationToken.IsCancellationRequested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these checks here? Seems like if we're really leaving it up to the implementation what to do with these, the implementation could very well choose to execute a request (e.g. ignore cancellation), in which case it seems you'd still want to do pre/post logging of whatever you can. I'd suggest just removing these checks and just dealing below with the possibility that request is null in LogHttpRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently HttpClient checks request in SendAsync before calling HttpMessageHandler, so request never could be null here anyway.
|
||
private static Guid LogHttpRequest(HttpRequestMessage request) | ||
{ | ||
return s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.RequestWriteName) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this more detailed check in both LogHttpRequest and LogHttpResponse, how about just doing it once in SendAsync and only calling LogHttpRequest and LogHttpResponse if it was set.
LogHttpResponseCore(t.IsFaulted ? null : t.Result, t.Exception, loggingRequestId); | ||
} | ||
}, | ||
TaskScheduler.Default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- Why are we not logging a response if the operation was canceled? Seems like that'd still be interesting to know, as otherwise someone looking at a log doesn't know whether the operation is still in flight or whether it was canceled.
- If the intent really is though to not log anything if it's canceled, that's more efficiently achieved by passing TaskContinuationOptions.NotOnCanceled to the ContinueWith call.
- The lambda is also closing over loggingRequestId, which allocates both a closure and a delegate to a method on that closure object. It'd be more efficient to pass the needed state in through the continuation's object state; since the state in this case is a Guid, it'll still be boxed, but that's better than the two larger allocations for the delegate and closure. Something like:
responseTask.ContinueWith((t,s) => LogHttpResponseCore(t.IsFaulted ? null : t.Result, t.Exception, (Guid)s),
loggingRequestId, CancellationToken.None, TaskContinuationOptions.NotOnCanceled, TaskScheduler.Default);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that logging cancelled requests is a good idea. It also requires passing TaskStatus to Write event, so user's won't need to guess what happened to the request
LGTM |
@@ -160,6 +161,10 @@ protected override void Dispose(bool disposing) | |||
|
|||
protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) | |||
{ | |||
if (DiagnosticsHandler.IsEnabled()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up calling:
s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.RequestWriteName);
How much slower is that than just:
s_diagnosticListener.IsEnabled();
Basically, I'm looking for the fastest possible way to know "diagnostics are definitely not enabled": it's ok if we report that diagnostics are enabled even if this particular listener may not be, but it's not ok to report that diagnostics are not enabled even if they may be. That then ensures that if no diagnostics are enabled, we as quickly as possible can take the fast path of going directly to CurlHandler (or WinHttpHandler), but if any diagnostics are enabled, then we can take the slightly slower (hopefully not too much slower) path of going through the extra checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DiagnosticSource does not have a no-argument IsEnabled() API.
However what is true is that if there are NO subscribers to the Listener, IsEnabled(string) returns very quickly (it checks a field for null and returns false).
It is true that we could add a no-argument IsEnabled() which would avoid the need to pass the string, the cost of passing is pretty small (Basically a memory fetch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Activity support, this method will be twice less-efficient.
s_diagnosticListener.IsEnabled(RequestWriteName) || s_diagnosticListener.IsEnabled(ActivityWriteName);
So having no-argument IsEnabled() seems like a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DiagnosticSource does not have a no-argument IsEnabled() API.
I see. I just assumed it had a similar scheme as EventSource, with an IsEnabled() that was broad and then an overload that was more specific to a particular set of events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However what is true is that if there are NO subscribers to the Listener, IsEnabled(string) returns very quickly (it checks a field for null and returns false).
@vancem, I was just looking at some ASP.NET traces, and IsEnabled(string) is actually showing up. It's a virtual call, so it doesn't get inlined and involves virtual dispatch, and then even if it wasn't it's got enough IL to not be considered for inlining. Seems like something that should be addressed.
cc: @benaadams, as the traces came from him (thanks, Ben)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EventSource
works well where it essentially evaporates if its not enabled (using the check pattern), but DiagnosticSource
always pays the virtual dispatch cost event with an Enabled test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add IsEnabled()
as non-virtual call to DiagnosticsListener
and have it return diagnosticSubscription != null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add IsEnabled()
Yup, that's what I suggested offline to Vance:
public bool IsEnabled() => _subscriptions != null;
else | ||
{ | ||
response = await this.handlerToFilter.SendAsync(request, cancellationToken).ConfigureAwait(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more efficient in the resulting IL to do this as a single await rather than two:
Task<HttpResponseMessage> responseTask = DiagnosticsHandler.IsEnabled() ?
this.diagnosticsPipeline.SendAsync(...) :
this.handlerToFilter.SendAsync(...);
response = await responseTask.ConfigureAwait(false);
CancellationToken cancellationToken) | ||
{ | ||
//HttpClientHandler is responsible to call DelegatingHandler.IsEnabled() before forwarding request to it. | ||
//so this code will never be reached if RequestWriteName is not enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically that's not 100% true: it could have been disabled between the time we checked and here. But that's true even if we were checking here, i.e. there will always be the potential for a race condition.
The code you have is fine... I'm just nit picking at the comment.
Guid loggingRequestId = LogHttpRequest(request); | ||
Task<HttpResponseMessage> responseTask = base.SendAsync(request, cancellationToken); | ||
|
||
if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ResponseWriteName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this check? The synchronous call to SendAsync should be very fast; it's unlikely this will have changed since we previously checked, and in the worst case and it has, we just pay a little expense we didn't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually two different strings being comared. The first with a 'Request' suffix and the latter with a 'Response' suffix. Since we are about to write the Response we do the check that that event is enabled. It is the expected cliché (always surround the write with an IsEnabled check).
One could argue that the ability to turn off 'Response' selectively even if 'Request' may not be valuable. But since we are not in the most critical path (we have already committed to doing some logging).
In thinking about it however, I think it is better to change it so that the 'master' switch is the Response rather than the Request. (and put the if check in the request part). That is because if you want to be lean, you only have one event you want it to be the response (since it has the most information, and in particular the duration). Then you can decide if you want both Request as well as the Response.
new | ||
{ | ||
Response = t.Status == TaskStatus.RanToCompletion ? t.Result : null, | ||
LoggingRequestId = s, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoggingRequestId will now be typed as Object instead of as Guid. Do you want to change this to LoggingRequestId = (Guid)s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, we want it to be a GUID.
|
||
internal static bool IsEnabled() | ||
{ | ||
return s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ResponseWriteName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here is someone is listening to the Request event but not the Response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer to your question, is that you if you don't turn on the 'Response' event, than you can't get the Request event either. (You can however do it the other way around you can get the Response without the Request).
The goal was to be as lean as possible when logging is turned off. Thus 'Response' was chosen as the 'master' switch, that turns off both requests and responses. We chose Reponse to be the master because it has the most information available (e.g. whether it succeeded or even the duration etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our assumption is that Response is the most interesting event, so there may be scenarios when user is interested in Response only or in both: Request and Response.
All subscribers I'm aware of, including Intellitrace, subscribe for both events.
Do you have any particular scenario for pure Request event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that and its what I assumed, but I'm just wondering how obvious that is to a user? Like if for what ever reason I'm a user and I think I only want to know about the request and hence thats the event I subscribe to, what happens? If nothing happens, how do I find out that I should have also subscribed to the response event to get things to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you point.
It seems like users just have to look to this implementation to find DiagnosticSource name, events and behavior.
@nbilling Are you aware of any public documentation for HttpClient DiagnosticSource instrumentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that when something like this goes wrong, we are expecting users to read the source vs it working the way they expect, doesn't seem like a great answer... I get that for perf reasons it was done the way it was done, but is the perf again so great that we aren't going with the path that drops people into the pit of success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be clear that the number of people using DiagnosticSource is limited (probably less than 100 worldwide). Moreover it is used in the most perf-critical parts of the system. I think it is reasonable to bias things toward perf at the expense of usability, especially if that usability can be fixed with some reasonable docs. In the not too-distant furture (months) we shodul have having the and source server set up so that people can frankly just debug into things if they run into a problem with a very good experience. That is probably BETTER than docs because it will always be accurate, and clear.
There is a pretty good chance that we will use this trick other places (basically form 'groups' of events where one is considered the 'master' of the others.
@lmolkova can you add to the description of the 'IsEnabled' callback the fact that some events (typically a stop event) might used as the 'gate' for other events (e.g. the cooresponding start event).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to use no-arg IsEnabled() as a 'gate' for all events, it will address initial concern about pure Request events and would allow subscribing to Request/Response independently.
However users still would need to check this implementation to know which events are fired and their payload properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy as long as its documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avanderhoorn I'm working on Activity (new class in System.Diagnostics.DiagnosticSource) support in HttpClient.
As a part of this work, two new events will be fired: System.Net.Http.Activity.Start and Stop.
They will have other payloads, so we cannot reuse existing Request/Response events.
In the Write callback, users will receive request (response) objects and they also may access Activity.Current
as a diagnostics context: Activity carries Id, start/stop time and some other properties.
Once new events are implemented, I suggest to deprecate Request/Response and let users migrate to Activity events instead, since they (along with Activity.Current context) will be more descriptive.
I'd be happy to document new events (e.g. as a .md file in this repo) and recommend to use them.
Would you be ok with this?
TimeStamp = timestamp, | ||
Exception = t.Exception, | ||
RequestTaskStatus = t.Status | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#13172 looks to "Send failed HTTP requests to DiagnosticSource"... does one determine an exception case by looking to see if .Exception
is null or not? Not that this has to follow what we do else where but most other events that I can think of have a separate event for exceptions and try not to "overload" the exception case into the end event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a fair question, and currently I don't have a strong opinion. In such situation I typically take my cue from the code. What convention makes the code cleaner (both for creation and consumption). Certainly from a start-stop perspective, it is GOOD if every start has a stop and the success/failure of the activity does not change the event used as the stop.
On the other hand, often errors are much more interesting than success cases, so it seem very GOOD if errors fired their own events (and thus you can easily filter to just them). However this might be taken care of simply by having an event for ANY exceptions (which cover errors regardless of component), so you don't need special error events in most cases.
Whatever we do, ideally we would do it uniformly (we don't have different conventions for different components.
My inclination, however is to ALWAYS have stop events (independent of error, but with an indication in the payload) like we have above, BUT we may ADDTIONALLY want an event that fires on an error (before the stop event fires, because it makes it clear that the error event is nested inside the activity).
But these are just inclinations. If this proposal has bad properties that we have not yet considered, we should certainly continue the discussion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we thought that the exception case was more interesting for people in the exception case, which should happen more rarely than the success case, from a perf perspective, wouldn't making it a separate event be more performant - since currently people who are interested in only exceptions have to subscribe to the success case and internally check, as apposed to if there was an exception event, their paths would only be lit up on error... does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me summarize: we are optimizing case when user is interested in Exceptions only.
@vancem I think it's an opportunity for the new DiagnosticSource.IdEnabled(string ,object, object) overload
if (diagListener.IsEnabled("Response", exception))
//exception could be null and user can filter out responses without exceptions
{
diagListener.Write("Response", new {Response, exception, ...})
}
...
diagListener.Subscribe(observer,
(name, exception, _) => name == "Response" && exception != null});
User who provided name-based predicate or none at all, receive all events.
It involves additional exception check. However, it keeps code clean and reduces number of events user has to be aware of.
Regarding common event for any exception: there are no conventions for additional context that may be provided with any exception and without context it may be useless. Activity.Current should solve this problem though.
Sorry for the late feedback but I'm currently on paternity leave and only just saw this. |
{ | ||
LogHttpRequest(request, loggingRequestId); | ||
} | ||
Task<HttpResponseMessage> responseTask = base.SendAsync(request, cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder - can SendAsync
throw in case of DNS error or invalid args? Should it be in try{}catch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than for things like null inputs, it should not throw synchronously: failures for things like DNS resolution come out via the Task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can other handler be in the pipe and throw synchronously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as currently implemented. Where do you believe such a handler could be plugged in? This handler is explicitly being created as part of the HttpClientHandler implementation and wrapped around either CurlHandler or WinHttpHandler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for explanation. If it's not possible - I'd suggest to write a comment for future reference
responseTask.ContinueWith( | ||
(t, s) => | ||
{ | ||
long timestamp = Stopwatch.GetTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear what's the benefit of getting timestamp in production code, not in subscriber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure that all subscribers get the same timestamp for correlation, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. if this is the intent. thanks for clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that was the original intent... that way all subscribers are able to show users the same values.
|
||
private static void LogHttpResponse(Task<HttpResponseMessage> responseTask, Guid loggingRequestId) | ||
{ | ||
responseTask.ContinueWith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check responseTask
for null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bug in the handler if SendAsync returned null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug in framework? Can there be another handler that introduced this error? Should tracing handle this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You explained above that other handler can not be injected between logging and actual working handler.
DiagnosticsHandlerLoggingStrings.ResponseWriteName, | ||
new | ||
{ | ||
Response = t.Status == TaskStatus.RanToCompletion ? t.Result : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there were an exception and response is not populated - is LoggingRequestId
the only field I can use to understand what was the original request?
TimeStamp = timestamp, | ||
Exception = t.Exception, | ||
RequestTaskStatus = t.Status | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case I haven't subscribed on Request - I still want to get request properties in this event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If request did not throw an exception, you can access it's properties through Response.RequestMessage.
If exception was thrown (which is related to you previous question), you need to subscribe to Request event and map response event to request with LogginRequestId.
If you don't want to store requests to access them on failure, I'd suggest to wait for Acitivty support
…15971) * Refactor DiagnosticSource logging, prepare for Activity support Commit migrated from dotnet/corefx@de3b53d
This change moves DiagnosticSource logging to DiagnosticsHandler and fixes #13172.
It prepares HttpClient for correlation feature support, which will allow to store global context for the current operation and propagate it in HTTP Requests.
Correlation-feature background:
/cc @vancem @avanderhoorn @nbilling @cwe1ss @karolz-ms @SergeyKanzhelev @brahmnes @davidfowl