Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Http request instrumentation with Activity (correlation part2) #16393

Merged
merged 9 commits into from
Mar 3, 2017

Conversation

lmolkova
Copy link

This change implements instrumentation for http requests with new System.Diagnostics.Activity API. Activity provides diagnostics context for current operation.

  • It introduces new events: System.Net.Http.Activity.Start and System.Net.Http.Activity.Stop.
  • It implements header propagation for operation context. Http Protocol proposal

Instrumentation is documented in the guide included into this PR.

@lmolkova
Copy link
Author

@@ -155,6 +155,7 @@ public static unsafe bool TryGetHeaderName(IntPtr buffer, int length, out string
case 'K': potentialHeader = KeepAlive; goto TryMatch; // [K]eep-Alive
case 'S': potentialHeader = SetCookie; goto TryMatch; // [S]et-Cookie
case 'U': potentialHeader = UserAgent; goto TryMatch; // [U]ser-Agent
case 'R': potentialHeader = RequestId; goto TryMatch; // [R]equest-Id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt to have extra headers here, but I'm not familiar with this particular one. Who sends this, and when? Having this here is really only beneficial if it's coming down as part of a response.

Copy link
Author

@lmolkova lmolkova Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This headers are new, declared in this change. They are injected to propagate correlation context below in DiagnosticsHandler. They only make sense in Request and not passed in response.

I noticed TryGetHeaderName function only called for response, but also all known headers including pure Request headers (e.g. User-Agent or Host) are handled there.
I'm fine to remove new ones though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub
Apparently, there is a test that checks all known headers are handled in TryGetHeaderName. Do you know if it's intended test behavior?
I'll return back my changes for now, but I'm open to fix the test if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern with leaving it in the code is that it's taking up the potential place of another header we might want to add and would conflict with it. Go ahead and leave it for now, but please add a comment next to it indicating that it's unexpected to be needed for responses and could be replaced by another case in the future if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sorry, I take that back. I think the right fix is to just put that known request header string on a separate type, i.e. have one type for known response header names and one for known request header names.

{
baggageIsEmpty = false;
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be:

bool baggageIsEmpty = !activity.Baggage.Any();

or if you want to avoid LINQ:

using (IEnumerator<KeyValuePair<string, string>> e = activity.Baggage.GetEnumerator())
{
    baggageIsEmpty = !e.MoveNext();
}

baggage.Add(new NameValueHeaderValue(baggageItem.Key, baggageItem.Value).ToString());
}
request.Headers.Add(HttpKnownHeaderNames.CorrelationContext, baggage);
}
Copy link
Member

@stephentoub stephentoub Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why enumerate Baggage twice? How about replacing this and the baggageIsEmpty stuff above with something like:

using (IEnumerator<KeyValuePair<string, string>> e = activity.Baggage.GetEnumerator())
{
    if (e.MoveNext())
    {
        var baggage = new List<string>();
        do
        {
            KeyValuePair<string, string> item = e.Current;
            baggage.Add(new NameValueHeaderValue(item.Key, item.Value).ToString());
        }
        while (e.MoveNext());
        request.Headers.Add(HttpKnownHeaderNames.CorrelationContext, baggage));
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

//So we let user log it as exception happened in this 'parent' activity
if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ExceptionEventName))
{
s_diagnosticListener.Write(DiagnosticsHandlerLoggingStrings.ExceptionEventName, new {Exception = ex});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok that events with the same name have different state associated with them? The earlier change that added use of this exception event sent additional state with the exception, like a timestamp and a request ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also including the "request" variable in the event properties? Would allow subscribers to log the requested URL with the exception etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub
As you noticed below, Activity and Request/Response events are mutually exclusive. Activity event consumers will always receive Exception with single exception object. Request/Response subscribers will always receive exception with guid and timestamp.

@cwe1ss

  1. From the logging perspective, URL may be logged in Request event along with Request-Id. And Exception will be logged with the same Id as well. But if subscriber did not instrument the request then in Exception callback there is no context for this particular request.
  2. When exception event is received, Activity.Current holds context for this operation.
    Subscriber can add any context to Activity.Current.Tags in the Request event and use them later in Exception and Response callbacks. It may cause Tags to overfill, considering that all tags should be written with every log message, it may be a problem.

Considering that Exception events should not happen too frequently, I'm fine with adding request to the payload.

@@ -104,6 +194,9 @@ private static void LogHttpResponse(Task<HttpResponseMessage> responseTask, Guid
}, loggingRequestId, CancellationToken.None, TaskContinuationOptions.None, TaskScheduler.Default);
}

private static readonly DiagnosticListener s_diagnosticListener =
new DiagnosticListener(DiagnosticsHandlerLoggingStrings.DiagnosticListenerName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should be at the top of the class.

}
// Poll with a timeout since logging response is not synchronized with returning a response.
WaitForTrue(() => cancelLogged, TimeSpan.FromSeconds(1),
"Cancellation was not logged within 1 second timeout.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly short timeout... we've seen CI get bogged down and short timeouts like this end up causing spurious failures. Would it hurt anything to extend this to, say, 10 seconds?

Copy link
Author

@lmolkova lmolkova Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout is used in other tests here, which were running for quite a while and seems to be stable, at lease I cannot find any issues for test failures
Note that response is awaited, and response events are fired after response is received, so I'm not even convinced there should be any timeout here.

{
//null Activity.Current means that incoming request was not instrumented
//and there is nothing we can do for outgoing request.
//Activity events are enabled so we don't send Request/Response events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the scheme employed here. Is System.Net.Http.Activity mutually exclusive with System.Net.Http.Request/Response? I'm wondering if both sets of events could/should fire for the same request, and also if System.Net.Http.Activity is enabled but Activity.Current is null, if that should preclude the Request/Response events from being fired as you have here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want Activity events to substitute Request/Response events eventually. Activity events are fired at the same time and provide more flexible filtering and Activity.Current for context.
So yes, Activity events are mutually exclusive with Request/Response.
There will be listeners listening to Activity requests only and there are existing listeners for Request/Response events.
If consumer is subscribed for Activity events and Activity.Current is null, it means that current operation is not instrumented, so it turns off any diagnostics for this particular request.
If consumer wants to receive Request/Response events, they should subscribe for them only.

There may be a problem if there are 2 consumers in the same process, one listening to Activity and other listening to Request/Response. In this case, second one will never receive any events.

{
activity = new Activity(DiagnosticsHandlerLoggingStrings.ActivityName);

s_diagnosticListener.StartActivity(activity, new { Request = request });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending events to a listener will synchronously invoke any registered handlers, right? What's the design if one of those handlers throws an exception? Will it emerge from here and prevent the http operation from going through, and if so, is that ok?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming callback (for Write or IsEnabled) throws exception, http request will not be processed and exception will be caught (or not) in user code.

It seems it's a common approach (here, in SqlClient and ASP.NET Core), not to handle exceptions from DiagnosticListener callbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is certainly the case that callers of DiangosticSource.Write are not intended to protect themselves from thrown exceptions. It is the responsibility of subscribers to avoid throwing exceptions.

Copy link
Contributor

@cwe1ss cwe1ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning me here. I'm back from my 2 month vacation 😎 so I finally have more time for this.

As I've already written in another post, I have quite strong feelings against using the "Request-Id" and "Correlation-Context" headers (and their format) without any hook to change it.

IMO supporting a platform-independent microservices system is very important. By forcing these header names in the default HTTP handler you make it very hard for 3rd party diagnostics systems to do multi-platform (e.g. Java, .NET, ...) tracing.

Unfortunately I think that due to this I still can't use any of the new Activity stuff in my OpenTracing project and I doubt that any existing distributed tracer can do so (Application Insights is not yet a distributed tracer). I still have to use my own HTTP handler (to allow tracers to write their own headers) and I probably still have to use my own ASP.NET Core middleware (to allow tracers to read their own headers) even if aspnet/Hosting#939 gets implemented.

In other words, I'm still missing the inject/extract concepts from OpenTracing. I believe that this will become a proprietary implementation for Application Insights only if you don't support hooks for that.

//So we let user log it as exception happened in this 'parent' activity
if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ExceptionEventName))
{
s_diagnosticListener.Write(DiagnosticsHandlerLoggingStrings.ExceptionEventName, new {Exception = ex});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also including the "request" variable in the event properties? Would allow subscribers to log the requested URL with the exception etc.

{
s_diagnosticListener.StopActivity(activity, new
{
Response = responseTask.Status == TaskStatus.RanToCompletion ? responseTask.Result : null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Request" and a possible "Exception" might be interesting here as well. This would allow subscribers to subscribe just to the Stop-Event and get all scenarios/information.

Copy link
Author

@lmolkova lmolkova Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request could be accesses from Response.RequestMessage
We discussed exceptions in #15971: We expect some subscribers to be interested in Exception only and the common approach is to fire Exception events separately.
So, subscriber needs to listen to both Exception AND Activity. Note: there are no IsEnabled calls for Start and Stop, there is one for "System.Net.Http.Activity"

@cwe1ss
Copy link
Contributor

cwe1ss commented Feb 23, 2017

I've just seen that the changes are not in the default handler but in a separate "DiagnosticsHandler" class. Is this handler added to the pipeline automatically? Is there a way to opt-out of this handler?

@lmolkova
Copy link
Author

@cwe1ss
'DiagnosticsHandler' is called if there is a subscriber to HttpHandlerDiagnosticListener. It was done in PR #15971 and #16385.

It's off by default and you can opt-in by subscribing to HttpHandlerDiagnosticListener.
Depending on the predicate provided for subscription, DiagnoticsHandler sends either Activity events or "System.Net.Http.Request" and "System.Net.Http.Response". You can find more details here

@lmolkova
Copy link
Author

lmolkova commented Feb 23, 2017

Thanks for mentioning me here. I'm back from my 2 month vacation 😎 so I finally have more time for this.

As I've already written in another post, I have quite strong feelings against using the "Request-Id" and "Correlation-Context" headers (and their format) without any hook to change it.

IMO supporting a platform-independent microservices system is very important. By forcing these header names in the default HTTP handler you make it very hard for 3rd party diagnostics systems to do multi-platform (e.g. Java, .NET, ...) tracing.

Unfortunately I think that due to this I still can't use any of the new Activity stuff in my OpenTracing project and I doubt that any existing distributed tracer can do so (Application Insights is not yet a distributed tracer). I still have to use my own HTTP handler (to allow tracers to write their own headers) and I probably still have to use my own ASP.NET Core middleware (to allow tracers to read their own headers) even if aspnet/Hosting#939 gets implemented.

In other words, I'm still missing the inject/extract concepts from OpenTracing. I believe that this will become a proprietary implementation for Application Insights only if you don't support hooks for that.

We are considering to have configurable incoming requests headers in ASP.NET Core, so ASP.NET Core would support multiple schemas including and defaulting to Request-Id and Correlation-Context. We can discuss it in aspnet/Hosting#939. We would be happy to follow any existing standard, if there was one.

For the outgoing requests, it's more complicated, since different endpoints presumably expect different sets of headers. If this is the case, any instrumentation, including this one, would require some level of customization to apply headers specific to endpoint. We don't prevent this: consumers are free to disable propagation and either use custom handlers or modify request object from the event payload (which is inadvisable, but still possible). So I believe we provide hooks for that.

There may be the case when ALL downstream services expect to receive same set of headers (but not the one we propose). We may support this case later to simplify configuration, but use Request-Id and Correlation-Context by default.

@vancem
Copy link
Contributor

vancem commented Feb 23, 2017

@cwe1ss -on the issue on wiring in the names RequestId and CorrelationContext into the code.

We do care about interoperability and are not discounting your concern. Our basic plan is

  • There should be a worldwide standard for this (frankly OpenTracing SHOULD have picked one). If there HAD been an RFC or even if OpenTracing had suggested something we would have chosen it and we would not be having this conversation.
  • Because of the lack of standards we FULLY EXPECT to have to add logic that allows flexibility at the HTTP boundary. But this should be thought of as a 'compatibility hack' until everyone gets on the standard (you could imagine removing this code eventually because it is not useful in the presence of a strong standard).
  • There seems to be little value in making an extensibility mechanism so that we can 'pass the problem off' to something outside of HTTP code. After all arguably information about what goes in HTTP headers and their format something that belongs in HTTP code.
  • Ideally we would only have to add this compatibility flexibility ON INCOMMPING Http. The reason is that it not inefficient to check for N different synonyms for a HTTP header, but it IS inefficient to have to write the fields N different ways because you are using N different logging systems that have different conventions. Our plan is to have an environment variable that allows you to map 'foreign' names to their 'standard' one. If all logging systems do this, then no one need sot fix their OUTGOING http. We will see...

However our first order of business is to get SOMETHING up and running. Certainly by getting this minimum we enable the scenario where you only use .NET code (a nice start), and even cross-logging cases if their loggers are sufficiently flexible (they have the incoming mapping as we suggest in #4). When we get this working, we will add #4 and things 'should' work. We will see.

So we ARE interested in getting cross-logging systems working. You could be very helpful in making that actually work (and ideally we get other logging systems to be adaptable (or even come with us as we propose and adopt a standard)). We really would have RATHER simply followed someone else's lead, but given that that was not there, we want to 'fix it right' (which is with a standard). We don't have to control the standard, it just needs to exist.

if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ActivityPropagateName))
{
request.Headers.Add(HttpKnownHeaderNames.RequestId, activity.Id);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Request-ID and Correlation-Context headers are not part of the HTTP standard and people could be using these names for arbitrary other purposes today. Instead of hard-coding the serialization of the activity, we should extract this code into a different method/class and allow the user to pick a built-in implementation (like the one you have here) or write a custom implementation.
Does a user opt into activity propagation or will it happen automagically by default? If automagically, then it seems like performing any propagation would be a breaking change (what if Request-ID is already specified in user code?) If opt-in, then it's reasonable to have a default implementation as shown here, but it should be extracted out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumentation is opt-in. To enable it, you need to listen to HttpHanlerDiagnosticListener and enable System.Net.Http.Activity event.

There is already an API for custom implementation - DelegatingHandler. Users can track outgoing HTTP requests there, if default Activity instrumentation does not fit. Custom DelegatingHandler does not need to repeat all this logic, since it can directly write to the logging system and do filtering in place.

Please also check @vancem reply:

@cwe1ss -on the issue on wiring in the names RequestId and CorrelationContext into the code.

We do care about interoperability and are not discounting your concern. Our basic plan is

  • There should be a worldwide standard for this (frankly OpenTracing SHOULD have picked one). If there HAD been an RFC or even if OpenTracing had suggested something we would have chosen it and we would not be having this conversation.
  • Because of the lack of standards we FULLY EXPECT to have to add logic that allows flexibility at the HTTP boundary. But this should be thought of as a 'compatibility hack' until everyone gets on the standard (you could imagine removing this code eventually because it is not useful in the presence of a strong standard).
  • There seems to be little value in making an extensibility mechanism so that we can 'pass the problem off' to something outside of HTTP code. After all arguably information about what goes in HTTP headers and their format something that belongs in HTTP code.
  • Ideally we would only have to add this compatibility flexibility ON INCOMMPING Http. The reason is that it not inefficient to check for N different synonyms for a HTTP header, but it IS inefficient to have to write the fields N different ways because you are using N different logging systems that have different conventions. Our plan is to have an environment variable that allows you to map 'foreign' names to their 'standard' one. If all logging systems do this, then no one need sot fix their OUTGOING http. We will see...

However our first order of business is to get SOMETHING up and running. Certainly by getting this minimum we enable the scenario where you only use .NET code (a nice start), and even cross-logging cases if their loggers are sufficiently flexible (they have the incoming mapping as we suggest in #4). When we get this working, we will add #4 and things 'should' work. We will see.

So we ARE interested in getting cross-logging systems working. You could be very helpful in making that actually work (and ideally we get other logging systems to be adaptable (or even come with us as we propose and adopt a standard)). We really would have RATHER simply followed someone else's lead, but given that that was not there, we want to 'fix it right' (which is with a standard). We don't have to control the standard, it just needs to exist.


//inject headers unless user does not want it to happen
//we may pass request to IsEnabled, but not clear how useful it is
if (s_diagnosticListener.IsEnabled(DiagnosticsHandlerLoggingStrings.ActivityPropagateName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bias is to NOT have the complexity of a Activity.Propagate check. My concern is that it is just one more thing that loggers have to do for the system to work as expected (and it is a bit less efficient, and the code is a bit more complex).

The rationale here is that putting users have control over propagation when things come into a machine, do they really ALSO need control when it leaves?

My preference is to remove this and if we find the need for this control we can add a 'NoPropagate' check at that time (my guess is we never will) and the system will end up simpler. Even if we do end up adding it, least that way the default will be good (by default you get propagation and you have to go out of your way to do something surprising).

else
{
Guid loggingRequestId = Guid.NewGuid();
LogHttpRequest(request, loggingRequestId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you note, these original logging message we would like to deprecate. At the very least, please rename the LogHttpRequest, to 'LogHttpRequestDeprecated' and LogHttpReponse to LogHttpResponseDeprecated. and similarly add a deprecated suffix to the RequestWriteName and ResponseWriteName symbols.

We believe the new functionality is a superset because activityies have both a ID and a beginning and ending timestamp.

I will start a separate discussion about deprecating DiagnosticSource events.

@stepro
Copy link

stepro commented Feb 24, 2017

One thing I'm not seeing in this change is a good separation between the general need for header propagation (on receiving a request, de-serializing into an Activity and then on an outgoing request, serializing back into headers) and a domain-specific usage of this capability in diagnostics.

Not all headers that need propagation relate to diagnostics, for instance, I'm using linkerd which requires a certain set of headers (l5d-ctx-*) to be propagated for service routing purposes. Or maybe I'm using a particular feature flag framework that expects a particular header provided during ingress that should also be propagated.

Something else interesting. Maybe I've created a service that has zero interest in participating in domain-specific activities like diagnostics or doesn't reference any feature flag framework, but it should still propagate the headers needed for its downstream services to perform those activities. This indicates that there's a general need to propagate independent of domain.

Because of this, I'd defined an approach whereby an agreed upon header, say "Context-Headers", lists all the other headers that should be propagated, alleviating a given runtime/service from implementing domain-specific propagation routines. This is very flexible because it accommodates arbitrary existing HTTP headers rather than requiring everyone to latch on to putting contextual information inside the value of a single well-known HTTP header like Correlation-Context.

I recognize a lot of thought and time has gone into the choice of Request-ID and Correlation-Context as the headers involved here, but I wanted to put my thoughts out there because these headers don't work for my scenarios today (and won't work unless the services/frameworks I use ultimately end up supporting those headers which is not likely for quite some time).

As for creating a standard, I appreciate the desire and hope we can make progress there. I remain concerned (see my previous comment) about introducing this as an auto-magic default before it is even in the works to become a standard and may be considered a breaking change for people already using headers with those names for their own purposes.

If we do pursue a standard here, I feel like we'd be better off if it could more readily accommodate the current state of the world using a mechanism like I describe, allowing it to integrate with existing headers. I'm reminded of how CORS added the Access-Control-Expose-Headers header to list headers that should be propagated across domain boundaries, instead of requiring that special, new headers be used to contain information that could cross domains.

@vancem
Copy link
Contributor

vancem commented Feb 24, 2017

@stepro
There is no question that the fact we are wiring in logging-specific logic (e.g. names of logging fields and how they propagate information to a Activity) into the HTTP logic is annoying from a dependency / extensibility perspective. It is not a nice problem because there really needs to be a piece of code that knows about logging (e.g. Activities) AND HTTP (how to set http headers). We have chosen to make HTTP to know about logging since logging is a 'everywhere' service (any code can depend on logging APIs).

As you point out there may be other scenarios that would like to propagate information in a similar way, but generalizing the problem actually makes the design problems worse. When we are talking about just logging, it was OK to take a direct dependency on logging APIs. However for just about anything else you need an abstraction that hides this, which is tricky to design correctly without understanding your scenarios completely. That abstraction in turn just 'passes the buck' to the code that actually does the work, and it is not at all clear where that code lives and how it gets configured.

Your idea of making the HTTP request be able to specify some generic semantics (e.g. what should be passed on to subordinate requests) is an interesting way to attack the problem, but it does not form a complete solution (at least for logging), because you still need to define WHERE to propagate the data to. I could believe we might be able to fix this, but you must admit that without a fair bit of prototyping and feedback from the real user cases we are not likely to have gotten it correct.

Thus we are reluctant to generalize the problem. We feel that logging is sufficiently valuable to pretty much ALL customers that it can justify 'special casing' it right now. If later we prototype and implement a more generalized solution we can move to that transparently, but our concern right now is to get the logging working.

In short, sometimes it is better to do something simple and direct and fix it later than it is to try to come up with the generalized solution right off the bat. We have had a very poor track record of 'nailing a design the first time', so we are concentrating on simply getting the really high value case (of logging) done in a straightforward way. Any design more complex than the simple way, needs to really justify that it WILL be valuable (which is hard when the scenarios require a bunch of things to all work together).

@stepro
Copy link

stepro commented Feb 25, 2017

@vancem - digesting your response. I think what's bugging me is that there are two pieces to this logging feature – activity-based logging and context propagation – that are tied together when there’s a lot of value in separating the concerns, and doing so would only add minimal complexity but gain a bunch of flexibility. Some scenarios:

  • I might want to use activity-based logging because AI just works with it, but no context propagation as defined because I’m calling some external service that doesn’t understand those specific headers, or worse, does understand them and interprets them in the wrong way (“Request-ID” is hardly a header someone else won’t have already used for who knows what).
  • I might not be using activity-based logging for my micro-service yet because I currently have some other logging solution in place, but I know that some upstream and downstream services have started to employ it and their logs need to be correlated, requiring context propagation as defined.
  • I might want to switch to activity-based logging to get some new features in AI but because I can’t update all my micro-services at the same time (actually, some of them I don’t even own – some other team does), I need to keep in place the current “Correlation-ID” header and propagation mechanism. I’m happy to implement a custom headers<->Activity translator to get this working so my service continues to fit into the overall scheme of things while still moving it forward.

I have no problem with a piece of code that knows how to create an activity with a HTTP request added as baggage. I also have no problem with a piece of code that knows about an activity and HTTP headers. But I don’t like a piece of code that knows about logging and HTTP headers (and especially one I can’t change). It’s fine if there is a default wire-up mechanism that registers two built-in pieces of code that a) manages an activity around a child request and b) propagates headers, but I’d at least like the second piece to be overridable (or removable).

Perhaps more concretely, I’m asking that there be two DelegatingHandlers in System.Net.Http – the current DiagnosticsHandler without the propagation code, and a new DelegatingHandler like PropagationHandler that serializes some of the current context into headers. When constructing a handler chain, the code might look like:

_winHttpHandler = new WinHttpHandler();
_propagationHandler = new PropagationHandler(_winHttpHandler);
_diagnosticsPipeline = new DiagnosticsHandler(_propagationHandler);

I’d then have the PropagationHandler let you override what it means to convert current context into HTTP headers via whatever injection method makes sense (static delegate, event? I'm not an expert at such things). The default would be what's described in this pull request, i.e. read the ID and baggage from the current Activity and serialize as Request-ID and Correlation-Context.

Do your reasons for not generalizing still apply given this kind of change? I don't think this is generalizing the problem as much as you might have thought I was suggesting.

@vancem
Copy link
Contributor

vancem commented Feb 27, 2017

@stepro First, your feedback is interesting. If we really can find a way to generalize the code with minimal cost, we are certainly open to it.

However I think we need more details of the design. Doing the segregations you describe is actually problematic, because we are using DiagnosticSources filtering hook to TELL US whether to propagate things. Thus propagation and logging are coupled. Now you can try to decouple them, but only at the cost of creating an extensibility mechanism for propagation. We could do that too, but it is unclear how to do that in a way that is reasonably flexible (that is looks like a real extensibility mechanism).

This kind of feature creep is what we are leery of. Maybe you can propose something that would allay our fears, but we would need more details (basically a description of what is public, and some sample code for using it to inject functionality (e.g. what DiagnosticSource needed) .

Our counter-claim, is that we have NOT precluded adding something like this in the future. Thus, yes, we have wired in code, but it is not much code (< 1page), and if we build the extension mechanism we can then put that code in the 'default' place with relative ease. So why not keep it super-simple now and work on the two problems without coupling? Sure the current code may not be factored the way the new code needs to be, but it is 1 page of code, how hard would fixing it be? As long as things are private, we have a lot of flexibility to fix things later.

So trying to tackle your other scenarios is reasonable, it is just that we want a particular scenario (uniform logging following a particular convention which we hope to standardize world wide) up and running ASAP.

@stepro
Copy link

stepro commented Feb 27, 2017

Doing the segregations you describe is actually problematic, because we are using DiagnosticSources filtering hook to TELL US whether to propagate things

I'm actually okay with this coupling when it's one of many possible implementations of a propagation handler. That way, if someone doesn't like the coupling, they just choose a different implementation of propagation. They'll still get the diagnostics handler goodness, but now they own propagation. They might choose to continue having the diagnostic source be the trigger for propagation, or they might decide that there's more involved in propagation than just logging and decouple the two concepts.

I think I remember that Liudmila originally had a separate diagnostics event you could listen to that would enable or disable just the propagation piece, but that was removed as part of the code review. That would have helped somewhat but it didn't give a place that I could inject a different implementation of propagation, so it wasn't a complete story.

Overall, I'm fine with the change if we understand what a compatible path to this kind of separation would look like in the future (I'm definitely not an expert in .NET fx compatibility demands!)

P.S: The first problem I described, where sending "Request-ID" and/or "Correlation-Context" headers to a service that would do bad things given those headers: looking at the code more, I think this can be resolved if a diagnostic listener returns false for IsEnabled given a particular request object? Where/how would a user be able to plug in code that would give that behavior (given that it looks like if any listener returns IsEnabled=true, then the overall value is true). Would there be injection points in, say, an AI library that is listening to diagnostics to override IsEnabled for specific domains? This was probably my biggest concern given the generic (and rather obvious) naming of the headers.

@vancem
Copy link
Contributor

vancem commented Feb 27, 2017

@stepro

P.S: The first problem I described, where sending "Request-ID" and/or "Correlation-Context" headers to a service that would do bad things given those headers

This is basically a problem that the current code has. The basic assumption is that HTTP headers don't collide (thus having more of them is never a negative (except maybe perf)). If that collision happens, and there is any activity logging from any source then you will have a problem. (The receiver would also have to be inflexible).

Note that this conceptual problem happens ANY time you make a new HTTP header. Even with common names, it would take quite a bit of bad luck to have it be a real problem. Once the standard takes hold, it won't be a problem at all.


public const string ExceptionEventName = "System.Net.Http.Exception";
public const string ActivityName = "System.Net.Http.Activity";
public const string ActivityPropagateName = "System.Net.Http.Activity.Propagate";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this definition as we are not using it anymore

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Guid loggingRequestId = Guid.Empty;

//cache IsEnabled result for Activity name
if (!s_activityEventIsChecked)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer to not have static caching of enabled state. They are subtle bug farms, especially if there is contract in between (like there is here). What if one subscriber just wanted exceptions, and then a bit later, another subscriber joins and wants Activities. As the code is written the second subscriber does not get his events (because of cached enabled state when the first subscriber enabled the diagnosticSource).

I have sent you a counter-proposal in E-mail.

@lmolkova
Copy link
Author

lmolkova commented Mar 1, 2017

We are finished changes in this PR together with @vancem.
Could you please check the final version and let us know if you have any concerns or questions.

I'd like to highlight compatibility with the 'old' "System.Net.Http.Request" and "Response" events:

  1. Existing consumers for "Request" and "Response" events may still use them and should consider migration to Activity events. We will announce deprecation separately
  2. If there is at least one consumer for new "System.Net.Http.Activity" events ALL consumers will receive Activity events only.
  3. Consumers may subscribe to "System.Net.Http.Exception". The event intentionally does not have loggingRequestId, so it is not possible to map it to the 'old' Request and Response events.

//always stop activity if it was started
if (activity != null)
{
activity.SetEndTime(DateTime.UtcNow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 16ms resolution of UtcNow a concern here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @pharring who just looked into this issue in AI SDK

Copy link
Author

@lmolkova lmolkova Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UtcNow is precise now: dotnet/coreclr#9736

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thx. Ship it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already said in #16491, I don't like that users have to call SetEndTime but not SetStartTime - especially because Stop uses DateTime.UtcNow anyway. This is completely unexpected IMO and most people who will use the Activity API will get this wrong.

Why not fix the Activity API (the StopActivity part) in a way that uses DateTime.UtcNow for Start and Stop by default and also sends the duration to the DiagnosticSource listeners?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you point, let's discuss it in #16491 and fix in another PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateTime.UtcNow is more precise, but it's still not as precise as the system stopwatch. In ASP.NET Core we use DateTime.UtcNow to log the start time of the request, but we use a timestamp from Stopwatch (QPC) to compute durations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pharring - I suspect ASP.NET used both UtcNow and Stopwatch because up until about a week ago UtcNow was NOT very precise (16 msec). It is now upgraded to about 1usec, which should be good enough for HTTP request and response durations. That should be enough.

@vancem
Copy link
Contributor

vancem commented Mar 2, 2017

OK we are on the cusp of committing the merge. The upshot of this change is that is only affects logging (we can be confident of that because the DiagnosticHander was the only code changed and that only gets called when logging is on. This makes the change relative safe (only profilers/debuggers could possibly be broken). We believe that we are compatible, and only added new functionality.

We understand that we may want to generalize the code at some point if the future, but we don't want to block on that. In general anything that can be fixed compatibly is probably OK at this point.

If you have any concerns ABOUT THIS PR that rise above that, speak them now, as we are going to merge tomorrow.

do
{
KeyValuePair<string, string> item = e.Current;
baggage.Add(new NameValueHeaderValue(item.Key, item.Value).ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require some encoding? What if a baggage value contains a ","? Are there any other characters that are not allowed as header values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. We are trying to keep the efficiency high and the complexity low, and frankly having arbitrary strings in Baggage is not really an interesting scenario (they are control values, not arbitrary strings). This argues that we push the problem to the clients of Baggage (either fail if you try to set what we will define as illegal baggage values (and possibly give them an API that does the escaping). Thus the change will be in System.Diagnostics.Activity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at NameValueHeaderValue (I didn't know about this class before) and it seems like it is checked there. It will throw an exception if I understood it correctly.

@vancem vancem merged commit 896d790 into dotnet:master Mar 3, 2017
@karelz karelz modified the milestone: 2.0.0 Mar 7, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…activity_part2

Http request instrumentation with Activity (correlation part2)

Commit migrated from dotnet/corefx@896d790
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.