-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add YARP Telemetry instrumentation #501
Conversation
The failing tests are the timing tests. They're flaky, I will figure out how to make those more stable. |
[Event(3, Level = EventLevel.Informational)] | ||
public void ProxyStage(ProxyStage stage) | ||
{ | ||
if (IsEnabled(EventLevel.Informational, EventKeywords.All)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ProxyStage and ContentTransferring use EventLevel.Verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could change the EventLevel for these, or we could also think about introducing Keyword flags for allowing more granular control over which events we fire.
I don't believe we will reach a significant number of ProxyStage values either, so even those could be made into flags instead, allowing users to have fine-grained control over exactly which events they actually care about.
@@ -5,6 +5,7 @@ | |||
<TargetFrameworks>net5.0;netcoreapp3.1</TargetFrameworks> | |||
<OutputType>Library</OutputType> | |||
<RootNamespace>Microsoft.ReverseProxy</RootNamespace> | |||
<AllowUnsafeBlocks>true</AllowUnsafeBlocks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EventSource 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better
/// </summary> | ||
internal interface IUptimeClock | ||
{ | ||
long TickCount { get; } | ||
|
||
long GetStopwatchTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long GetStopwatchTimestamp(); | |
// See Stopwatch.GetTimestamp | |
long Timestamp { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be very explicit about what it is to avoid more confusion.
For example TickCount is in ms while Timestamp is in whatever frequency the underlying Stopwatch is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does IUptimeClock also need to expose the frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need for it, this is just an internal abstraction over fetching time so that we can have stable tests.
If we wanted a cleaner API I think we should just expose a TimeSpan - we don't right now to save a few instructions on repeated conversions.
And yes, renaming the clock abstraction to IClock makes sense. |
// PR REVIEW: | ||
// Should this event be logged on failure as well (instead of XContentTransferStop)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes? It might be good to know how much content transferred before the failure. Instead of XContentTransferStop? Don't you still need start/stop pairs? Would you need to do both ContentTransferred and XContentTransferStop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we log XContentTransferStart
and XContentTransferStop
ProxyStage events in try/finally.
If we were to log ContentTransferred on failure as well, there'd be no need to have the Stop
ProxyStage event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments, but overall it definitely looks good.
@@ -6,7 +6,7 @@ namespace Microsoft.ReverseProxy.Service.Proxy | |||
/// <summary> | |||
/// Errors reported when proxying a request to the destination. | |||
/// </summary> | |||
public enum ProxyError | |||
public enum ProxyError : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? We don't specify the enum base type explicitly in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just being really explicit about the backing type, as we can't change it in the future.
{ | ||
_ = input ?? throw new ArgumentNullException(nameof(input)); | ||
_ = output ?? throw new ArgumentNullException(nameof(output)); | ||
|
||
var telemetryEnabled = ProxyTelemetry.Log.IsEnabled(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the linker recognize this pattern and trim the code when telemetry is disabled? As far as I remember, in the networking telemetry we made those checks directly in the if
clauses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it recognizes telemetryEnabled
can't change, maybe?
We're not doing IsEnabled
checks inline in any product code here (YARP), which is the same pattern AspNetCore uses. Personally I prefer the readability of that approach over doing IsEnabled checks everywhere - it moves that responsibility onto the EventSource.
In general, I don't think we care too much about how trimmed down this package is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsEnabled is runtime state, right? The linker is a compile time operation and shouldn't apply in this scenario. The Jitter is the one that would be optimizing for the true/false code paths at runtime, and it can do that regardless of how you structure it.
Using telemetryEnabled
is simpler than repeating ProxyTelemetry.Log.IsEnabled();
in each if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a runtime check, but the linker has an option to remove EventSource code entirely by stubbing out EventSource.IsEnabled()
to be const false and then removing unreachable code.
Using telemetryEnabled is simpler than repeating ProxyTelemetry.Log.IsEnabled(); in each if.
I agree. In this case it's also a question about correctness, as IsEnabled()
can start returning true once we're already in the method and we could then work with invalid timing data (e.g. never setting the initial time/time between events)
{ | ||
ProxyTelemetry.Log.ProxyStage(isRequest ? ProxyStage.RequestContentTransferStart : ProxyStage.ResponseContentTransferStart); | ||
|
||
stopwatchTicksBetweenTransferringEvents = Stopwatch.Frequency; // 1 second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Should not it be exposed via IClock
interface too? AFAIK, we agreed on adding a Resolution
or Frequency
property there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #501 (comment), I don't think it's needed
|
||
static long StopwatchTicksToDateTimeTicks(long stopwatchTicks) | ||
{ | ||
var dateTimeTicksPerStopwatchTick = (double)TimeSpan.TicksPerSecond / Stopwatch.Frequency; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same question as above about a direct access to Stopwatch.Frequency
.
|
||
namespace Microsoft.ReverseProxy.Telemetry | ||
{ | ||
internal enum ProxyStage : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an explicit base type here?
@@ -25,7 +26,7 @@ public async Task CopyToAsync_InvokesStreamCopier() | |||
var source = new MemoryStream(sourceBytes); | |||
var destination = new MemoryStream(); | |||
|
|||
var sut = new StreamCopyHttpContent(source, autoFlushHttpClientOutgoingStream: false, CancellationToken.None); | |||
var sut = new StreamCopyHttpContent(source, autoFlushHttpClientOutgoingStream: false, new Clock(), CancellationToken.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to add telemetry verification here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just delegating to StreamCopier
inside, which is already tested, so duplicating that here seems redundant.
Is there another scenario you see that should be covered?
DisplayName = "Requests Started", | ||
}; | ||
|
||
_startedRequestsPerSecondCounter ??= new IncrementingPollingCounter("requests-started-rate", this, () => Volatile.Read(ref _startedRequests)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? It says the counter contains requests per seconds but it's reading the same value as "requests-started"
counter. Is there any computational magic behind IncrementingPollingCounter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IncrementingPollingCounter
has an internal field for last-value that it substracts, so it only reports the change in the last interval.
@@ -22,28 +25,68 @@ internal static class StreamCopier | |||
/// Based on <c>Microsoft.AspNetCore.Http.StreamCopyOperationInternal.CopyToAsync</c>. | |||
/// See: <see href="https://github.com/dotnet/aspnetcore/blob/080660967b6043f731d4b7163af9e9e6047ef0c4/src/Http/Shared/StreamCopyOperationInternal.cs"/>. | |||
/// </remarks> | |||
public static async Task<(StreamCopyResult, Exception)> CopyAsync(Stream input, Stream output, CancellationToken cancellation) | |||
public static async Task<(StreamCopyResult, Exception)> CopyAsync(bool isRequest, Stream input, Stream output, IClock clock, CancellationToken cancellation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename StreamCopier
to something that would tie it together with HttpProxy
?
It shouldn't be used for anything else than copying proxy contents now, if I understand this change properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be used for anything else than copying proxy contents now, if I understand this change properly.
Correct.
Should we rename StreamCopier
It's also going to be using Pipes in the future, so maybe just ContentCopier
?
I can make the change in a future PR if we want to change the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ContentCopier
name! But I don't insist on changing it here in this PR, we can do it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BodyCopier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Contributes to #158
IOperationLogger
and associated glueMicrosoft.ReverseProxy
EventSource
and the relatedHttpProxy
/StreamCopier
instrumentationProxyStage
enumMarked some open questions with
TODO
orPR REVIEW
comments.