Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add YARP Telemetry instrumentation #501

Merged
merged 19 commits into from
Nov 6, 2020
Merged

Conversation

MihaZupan
Copy link
Member

Contributes to #158

  • Removes IOperationLogger and associated glue
  • Adds a Microsoft.ReverseProxy EventSource and the related HttpProxy/StreamCopier instrumentation
  • Adds public ProxyStage enum

Marked some open questions with TODO or PR REVIEW comments.

@MihaZupan
Copy link
Member Author

The failing tests are the timing tests. They're flaky, I will figure out how to make those more stable.

src/ReverseProxy/Service/Proxy/HttpProxy.cs Outdated Show resolved Hide resolved
src/ReverseProxy/Service/Proxy/StreamCopier.cs Outdated Show resolved Hide resolved
src/ReverseProxy/Telemetry/ProxyStage.cs Outdated Show resolved Hide resolved
[Event(3, Level = EventLevel.Informational)]
public void ProxyStage(ProxyStage stage)
{
if (IsEnabled(EventLevel.Informational, EventKeywords.All))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ProxyStage and ContentTransferring use EventLevel.Verbose?

Copy link
Member Author

@MihaZupan MihaZupan Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change the EventLevel for these, or we could also think about introducing Keyword flags for allowing more granular control over which events we fire.

I don't believe we will reach a significant number of ProxyStage values either, so even those could be made into flags instead, allowing users to have fine-grained control over exactly which events they actually care about.

src/ReverseProxy/Telemetry/ProxyStage.cs Outdated Show resolved Hide resolved
test/ReverseProxy.Tests/Service/Proxy/StreamCopierTests.cs Outdated Show resolved Hide resolved
@@ -5,6 +5,7 @@
<TargetFrameworks>net5.0;netcoreapp3.1</TargetFrameworks>
<OutputType>Library</OutputType>
<RootNamespace>Microsoft.ReverseProxy</RootNamespace>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventSource 😢

Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better

src/ReverseProxy/Service/Proxy/HttpProxy.cs Outdated Show resolved Hide resolved
src/ReverseProxy/Telemetry/ProxyTelemetry.cs Outdated Show resolved Hide resolved
src/ReverseProxy/Telemetry/ProxyStage.cs Outdated Show resolved Hide resolved
src/ReverseProxy/Telemetry/ProxyTelemetry.cs Outdated Show resolved Hide resolved
/// </summary>
internal interface IUptimeClock
{
long TickCount { get; }

long GetStopwatchTimestamp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long GetStopwatchTimestamp();
// See Stopwatch.GetTimestamp
long Timestamp { get; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be very explicit about what it is to avoid more confusion.

For example TickCount is in ms while Timestamp is in whatever frequency the underlying Stopwatch is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does IUptimeClock also need to expose the frequency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for it, this is just an internal abstraction over fetching time so that we can have stable tests.

If we wanted a cleaner API I think we should just expose a TimeSpan - we don't right now to save a few instructions on repeated conversions.

src/ReverseProxy/Utilities/MonotonicTimer.cs Outdated Show resolved Hide resolved
@Tratcher
Copy link
Member

Tratcher commented Nov 3, 2020

And yes, renaming the clock abstraction to IClock makes sense.

src/ReverseProxy/Service/Proxy/StreamCopier.cs Outdated Show resolved Hide resolved
Comment on lines 91 to 92
// PR REVIEW:
// Should this event be logged on failure as well (instead of XContentTransferStop)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes? It might be good to know how much content transferred before the failure. Instead of XContentTransferStop? Don't you still need start/stop pairs? Would you need to do both ContentTransferred and XContentTransferStop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we log XContentTransferStart and XContentTransferStop ProxyStage events in try/finally.

If we were to log ContentTransferred on failure as well, there'd be no need to have the Stop ProxyStage event.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, try it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Contributor

@alnikola alnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, but overall it definitely looks good.

@@ -6,7 +6,7 @@ namespace Microsoft.ReverseProxy.Service.Proxy
/// <summary>
/// Errors reported when proxying a request to the destination.
/// </summary>
public enum ProxyError
public enum ProxyError : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? We don't specify the enum base type explicitly in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being really explicit about the backing type, as we can't change it in the future.

{
_ = input ?? throw new ArgumentNullException(nameof(input));
_ = output ?? throw new ArgumentNullException(nameof(output));

var telemetryEnabled = ProxyTelemetry.Log.IsEnabled();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the linker recognize this pattern and trim the code when telemetry is disabled? As far as I remember, in the networking telemetry we made those checks directly in the if clauses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it recognizes telemetryEnabled can't change, maybe?

We're not doing IsEnabled checks inline in any product code here (YARP), which is the same pattern AspNetCore uses. Personally I prefer the readability of that approach over doing IsEnabled checks everywhere - it moves that responsibility onto the EventSource.

In general, I don't think we care too much about how trimmed down this package is?

Copy link
Member

@Tratcher Tratcher Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsEnabled is runtime state, right? The linker is a compile time operation and shouldn't apply in this scenario. The Jitter is the one that would be optimizing for the true/false code paths at runtime, and it can do that regardless of how you structure it.

Using telemetryEnabled is simpler than repeating ProxyTelemetry.Log.IsEnabled(); in each if.

Copy link
Member Author

@MihaZupan MihaZupan Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a runtime check, but the linker has an option to remove EventSource code entirely by stubbing out EventSource.IsEnabled() to be const false and then removing unreachable code.

Using telemetryEnabled is simpler than repeating ProxyTelemetry.Log.IsEnabled(); in each if.

I agree. In this case it's also a question about correctness, as IsEnabled() can start returning true once we're already in the method and we could then work with invalid timing data (e.g. never setting the initial time/time between events)

{
ProxyTelemetry.Log.ProxyStage(isRequest ? ProxyStage.RequestContentTransferStart : ProxyStage.ResponseContentTransferStart);

stopwatchTicksBetweenTransferringEvents = Stopwatch.Frequency; // 1 second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Should not it be exposed via IClock interface too? AFAIK, we agreed on adding a Resolution or Frequency property there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #501 (comment), I don't think it's needed


static long StopwatchTicksToDateTimeTicks(long stopwatchTicks)
{
var dateTimeTicksPerStopwatchTick = (double)TimeSpan.TicksPerSecond / Stopwatch.Frequency;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question as above about a direct access to Stopwatch.Frequency.


namespace Microsoft.ReverseProxy.Telemetry
{
internal enum ProxyStage : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an explicit base type here?

@@ -25,7 +26,7 @@ public async Task CopyToAsync_InvokesStreamCopier()
var source = new MemoryStream(sourceBytes);
var destination = new MemoryStream();

var sut = new StreamCopyHttpContent(source, autoFlushHttpClientOutgoingStream: false, CancellationToken.None);
var sut = new StreamCopyHttpContent(source, autoFlushHttpClientOutgoingStream: false, new Clock(), CancellationToken.None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to add telemetry verification here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just delegating to StreamCopier inside, which is already tested, so duplicating that here seems redundant.

Is there another scenario you see that should be covered?

src/ReverseProxy/Telemetry/ProxyTelemetry.cs Show resolved Hide resolved
DisplayName = "Requests Started",
};

_startedRequestsPerSecondCounter ??= new IncrementingPollingCounter("requests-started-rate", this, () => Volatile.Read(ref _startedRequests))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? It says the counter contains requests per seconds but it's reading the same value as "requests-started" counter. Is there any computational magic behind IncrementingPollingCounter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IncrementingPollingCounter has an internal field for last-value that it substracts, so it only reports the change in the last interval.

@@ -22,28 +25,68 @@ internal static class StreamCopier
/// Based on <c>Microsoft.AspNetCore.Http.StreamCopyOperationInternal.CopyToAsync</c>.
/// See: <see href="https://github.com/dotnet/aspnetcore/blob/080660967b6043f731d4b7163af9e9e6047ef0c4/src/Http/Shared/StreamCopyOperationInternal.cs"/>.
/// </remarks>
public static async Task<(StreamCopyResult, Exception)> CopyAsync(Stream input, Stream output, CancellationToken cancellation)
public static async Task<(StreamCopyResult, Exception)> CopyAsync(bool isRequest, Stream input, Stream output, IClock clock, CancellationToken cancellation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename StreamCopier to something that would tie it together with HttpProxy?
It shouldn't be used for anything else than copying proxy contents now, if I understand this change properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be used for anything else than copying proxy contents now, if I understand this change properly.

Correct.

Should we rename StreamCopier

It's also going to be using Pipes in the future, so maybe just ContentCopier?
I can make the change in a future PR if we want to change the name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the ContentCopier name! But I don't insist on changing it here in this PR, we can do it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BodyCopier

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MihaZupan MihaZupan merged commit 0a5ae94 into microsoft:master Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants