-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection Activities in SocketsHttpHandler #101814
Conversation
I don't think so - although related, we don't need to do a diagnostic source for this scenarios
Yes. The Aspire dashboard will be able to resolve links when we implement dotnet/aspire#2577. This will be a forcing function to make that happen.
According to @ReubenBond, we don't use it in Aspire
Http/3 usage is still uncommon, and more complicated due to a QUIC connection being more virtual than physical. We should flush out the scenarios with 1.1/2 and then look at HTTP/3 later.
It would be better if we can be consistent, but not if that causes a bunch of extra trouble for that metric.
@lmolkova probably has recommendations. |
src/libraries/System.Net.Http/src/System/Net/Http/DiagnosticsHandlerLoggingStrings.cs
Outdated
Show resolved
Hide resolved
@@ -104,6 +108,17 @@ public void MarkConnectionAsNotIdle() | |||
_connectionMetrics?.IdleStateChanged(idle: false); | |||
} | |||
|
|||
public void LinkRequestActivity() | |||
{ | |||
Activity? requestActivity = Activity.Current; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we check _activity is null
before looking up Activity.Current
to make it a bit cheaper when disabled?
return; | ||
} | ||
|
||
requestActivity.AddLink(new ActivityLink(_activity.Context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tooling/infra prepared to deal with there being millions of links to a given connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Millions of links to is not a problem, millions of links from could be!
public const string RequestActivityStopName = RequestActivityName + ".Stop"; | ||
|
||
public const string ConnectionNamespace = "System.Net.Http.Connections"; | ||
public const string ConnectionActivityName = ConnectionNamespace + ".HttpConnectionOut"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a convention to have an Out
postfix on these names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a .NET convention to do so, but not an OTel convention. In general we probably want to sync with @lmolkova on naming and conventions here. If we can determine an OTel convention that would be preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this one is OperationName and I think has no OTel equivalent.
For the DisplayName
, we'd need to define a new span and span name.
Does this one last for the connection lifetime? If so I'd use something like
connection {server.address}:{server.port}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to figure out what is more clear in a trace:
System.Net.Http.Connections.HttpConnectionOut
or
System.Net.Http.Connections.HttpConnection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this one is OperationName and I think has no OTel equivalent.
For the DisplayName, we'd need to define a new span and span name.
OperationName is the default value used for DisplayName unless it is overriden somewhere. I don't think we get any value picking one string for OperationName and a different string for DisplayName. So we probably want OperationName = DisplayName = connection {server.address}:{server.port}
I am trying to figure out what is more clear in a trace
What gets transmitted by OTLP and rendered in the UI should be DisplayName and @lmolkova is suggesting that be "connection {server.address}:{server.port}". I think that would be clearer than either of those namespaced names no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperationName could still be important for the DiagnosticSource (DS -> EventSource) as the event name and then it's nice to have a const string for it, but you all know much more about it than I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperationName could still be important for the DiagnosticSource
This PR is not adding DiagnosticSource
/ DiagnosticListener
, only ActivitySource
+ activities, because this is what #93832 is about in my understanding. Is there a reason for also adding a DiagnosticListener
like we have it in DiagnosticHandler
? I know little about how is this being consumed.
So we probably want OperationName = DisplayName = connection {server.address}:{server.port}
Given that OperationName has no OTel equivalent according to #101814 (comment), IMO it would be better if the OperationName followed the existing .NET convention for consistency. We can use a different DisplayName if we want. In that case wouldn't we also want to change the DisplayName for the request activity, or do we consider it a breaking change we better avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that OperationName has no OTel equivalent according to #101814 (comment), IMO it would be better if the OperationName followed the existing .NET convention for consistency.
👍
In that case wouldn't we also want to change the DisplayName or the request activity, or do we consider it a breaking change we better avoid?
I would consider it part of onboarding onto HTTP semantic conventions which define what span name looks like.
My 2c on if changing DisplayName
is breaking: existing otel http instrumentations set span name according to the conventions and it could only be breaking to something else like vanilla ActivityListener
, but they'd still have the same OperationName
as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperationName could still be important for the DiagnosticSource (DS -> EventSource) as the event name and then it's nice to have a const string for it, but you all know much more about it than I do.
You can filter on Activity name as part of DiagnosticSourceEventSource and ActivityListener.Start/Stop, but not as part of ActivityListener.Sample or OTel. Given that OTel is a primary scenario we want to support I'd want to ensure that ActivitySource alone is sufficiently fine-grained to do the filtering we'd like to do. It won't harm anything to pick some constant value for OperationName that differs from DisplayName, but ideally nothing should be relying on us doing that and we'd move away from showing OperationName in our tools as a distinct piece of data.
@antonfirsov - would it be possible to show the output from a basic scenario. The easiest way to do that is to use the consoleExporter for Tracing. Add a nuget ref for And then use builder.Services.AddOpenTelemetry()
.WithTracing(tracing =>
{
tracing.AddSource("System.Net.Http.Connections");
tracing.AddSource("System.Net.Http");
tracing.AddConsoleExporter();
}); |
Beware the timing of when the Activity gets sent may make the UX awkward when trying to view the telemetry data live (or close to live). The OTel SDK and exporter do not transmit any information about an Activity until it has stopped as far as I am aware. If a request completes but the connection that serviced the request doesn't close until 10 minutes later that means there will be a 10 minute period where the event timings haven't been transmitted yet. If you want the timings to be available more quickly in the UI the options I can think of are:
|
.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs
Outdated
Show resolved
Hide resolved
@@ -74,10 +74,12 @@ public static async ValueTask<SslStream> EstablishSslConnectionAsync(SslClientAu | |||
sslStream.AuthenticateAsClient(sslOptions); | |||
} | |||
} | |||
activity?.AddEvent(new ActivityEvent("Tls.Authenticated")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to use spans and avoid using span events at all costs:
- A child span for TLS handshake would have a duration and timestamp
- It won't be verbose since connections are long-lived (much longer than requests)
- It can live without HTTP instrumentation on
- It can be turned off (if coming from a different source) while HTTP is on
- Span events are likely to be deprecated in the future in favor of independent Event API similar to logging
- We'll have standard means to record errors/duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to use spans and avoid using span events at all costs:
That seems fine to me. I mentioned in another comment that there was a timing issue where one long activity for the entire connection wouldn't transmit the events until the connection closed. I suggested changing the events to log messages as one potential solution, but changing them to spans also solves that problem.
It won't be verbose since connections are long-lived (much longer than requests)
I assume at the limit we might have one connection per request, but I hope such scenarios are uncommon in any high throughput service. Worst case if connection activity overhead was too high the app developer can always sample them or turn them off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the advantages that come from implementing this with sub activities. Another big advantage is that it would also work if ConnectCallback
is present, since we would implement them in downstream libraries. However, there are some challanges:
- It increases the scope of the problem, since we would need to touch multiple
System.Net.*
libraries. We may need to pull in more reviewers from the networking team. - If we want to add attributes to those sub-activities, it increases the scope even further since we would need to make decisions on the characteristics of those attributes (eg. naming, possible values for attributes like
error.type
, etc). As I also noted in Connection Activities in SocketsHttpHandler #101814 (comment), my recommendation would be to track this in a separate issue. - If there is an
Activity.Current
to be used as a parent activity, andSystem.Net.Http.Connections.HttpConnectionOut
is not enabled, the DNS/TCP/TLS activity would attach to the Http activity as parent which is somewhat unfortunate. We would need to document that users should avoid this.
Regardless, I agree sub-activities are better. If we have consensus, I will change this PR to add activities in downstream libs. @MihaZupan any concerns?
changing the events to log messages
Just to not exclude this option completely yet: @noahfalk do you mean a writing to DiagnosticListener
by this or some other technique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an Activity.Current to be used as a parent activity, and System.Net.Http.Connections.HttpConnectionOut is not enabled, the DNS/TCP/TLS activity would attach to the Http activity as parent which is somewhat unfortunate. We would need to document that users should avoid this.
This is why when we originally talked I said not to use Activity.Current and be explicit about passing in the activity if you can.
If the DNS and/or TLS activities are invoked and the parent is not a connection (As the source isn't enabled) , then they can look at the activity and see if its a connection or not and therefore not parent themselves to it. You will still see them on the overall timeline.
The other hack would be to put all these activities in the same source, so they can only be enabled/disabled as a group. But as I think they would be useful for other scenarios beyond http connections, that really does seem like a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something that would ultimately be ILogger but later recalled that I don't think ILogger is easy to generate from the level of this code in the stack. Technically there are ways around that but I wouldn't worry about pursuing it any further. I think sub-activities are both easier and better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as I think they would be useful for other scenarios beyond http connections
This is why I would prefer to create the sub-activities in downlevel libs. System.Net.Sockets|NameResolution|Security
would all have their own ActivitySource-s.
passing in the activity if you can.
It's not possible to pass it to downlevel libraries without significant (and IMO bloated) API additions, which we likely don't want.
switch (_kind) | ||
{ | ||
case HttpConnectionKind.Http: | ||
case HttpConnectionKind.Https: | ||
case HttpConnectionKind.ProxyConnect: | ||
stream = await ConnectToTcpHostAsync(_originAuthority.IdnHost, _originAuthority.Port, request, async, cancellationToken).ConfigureAwait(false); | ||
stream = await ConnectToTcpHostAsync(_originAuthority.IdnHost, _originAuthority.Port, request, async, activity, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a chance that we'd need to instrument connections outside of HTTP client stack?
If so, would we regret not adding these ones to Socket
and friends?
@@ -666,13 +680,19 @@ private async ValueTask<Stream> ConnectToTcpHostAsync(string host, int port, Htt | |||
{ | |||
if (async) | |||
{ | |||
await socket.ConnectAsync(endPoint, cancellationToken).ConfigureAwait(false); | |||
IPAddress[] addresses = await Dns.GetHostAddressesAsync(host, cancellationToken).ConfigureAwait(false); | |||
activity?.AddEvent(new ActivityEvent("Dns.Resolved")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here on events. Spans would provide much more value and flexibility than events.
IPAddress[] addresses = await Dns.GetHostAddressesAsync(host, cancellationToken).ConfigureAwait(false); | ||
activity?.AddEvent(new ActivityEvent("Dns.Resolved")); | ||
await socket.ConnectAsync(addresses, port, cancellationToken).ConfigureAwait(false); | ||
activity?.AddEvent(new ActivityEvent("Transport.Connected")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric that shows how long it takes to establish a connection and a span that shows an establishment attempt is extremely valuable, while event only allow to visually see it on a trace.
Query that would calculate duration between parent activity start time to event timestamp would be not a great experience.
{ | ||
Stream? stream = null; | ||
IPEndPoint? remoteEndPoint = null; | ||
Activity? activity = CreateActivity()?.Start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make sure we don't forget to pick relevant attributes for this one and hopefully TLS, DNS and connect spans.
@@ -689,9 +709,10 @@ private async ValueTask<Stream> ConnectToTcpHostAsync(string host, int port, Htt | |||
} | |||
catch (Exception ex) | |||
{ | |||
activity?.Stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll need to record error.type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't add attributes and it would be weird to record only error.type
. We are tracking the addition of attributes to the request activity in #93019. We may consider adding attributes to the connection span as well, but I would do it in a separate PR together with or after resolving #93019. @samsp-msft do you think think this would be valuable? If yes, my recommendation is to open a separate tracking issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
…andlerLoggingStrings.cs Co-authored-by: Miha Zupan <[email protected]>
Co-authored-by: Noah Falk <[email protected]>
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Resolves #93832 by introducing an activity for every HTTP/1.1 and HTTP/2.2 connection, emitting
ActivityEvent
at DNS/TCP/TLS completion and linking the connection activity to the request activity when the request is assigned to a connection from the pool.Marking as a draft for now, since there are several open questions:
DiagnosticListener
like inDiagnosticsHandler
or are we OK just publishing the activities?ActivityEvent
doesn't publish notifications onActivityListener
s. However, when a request is completed, the linked connection activity should contain all the events by that time. Would this be "interactive enough" for Aspire?ConnectCallback
is set. To overcome this, we would need to grabActivity.Current
in downlevel libraries, create sub-activities instead of events, or expose an API so users can emit these events manually in their callbacks. Do we expect Aspire users to useConnectCallback
frequently enough to make it worth the trouble?System.Net.Quic
(again either by emitting events onActivity.Current
or by creating sub-activities instead of events).http.client.connection.duration
metric, since the latter starts measuring time only after a successful TLS handshake. Do we need to harmonize the two?cc @samsp-msft @noahfalk @MihaZupan @lmolkova