Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial common infrastructure for wiring up distributed tracing #6916

Merged
merged 17 commits into from
Jul 16, 2019
Merged
11 changes: 8 additions & 3 deletions eng/Packages.Data.props
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
<PackageReference Update="Microsoft.Azure.Services.AppAuthentication" Version="[1.0.3, 2.0.0)" />
<PackageReference Update="Microsoft.Azure.Storage.Blob" Version="9.4.2" />
<PackageReference Update="Microsoft.Azure.Test.HttpRecorder" Version="[1.13.1, 2.0.0)" />
<PackageReference Update="Microsoft.Bcl.AsyncInterfaces" Version="1.0.0-preview6.19259.10" />
<PackageReference Update="Microsoft.Bcl.Json.Sources" Version="4.6.0-preview3.19128.7" />
<PackageReference Update="Microsoft.Build.Tasks.Core" Version="15.5.180" />
<PackageReference Update="Microsoft.Build" Version="15.5.180" />
Expand Down Expand Up @@ -81,7 +80,6 @@
<PackageReference Update="System.IdentityModel.Tokens.Jwt" Version="[5.4.0, 6.0.0)" />
<PackageReference Update="System.IO.Compression" Version="4.3.0" />
<PackageReference Update="System.Linq" Version="4.3.0" />
<PackageReference Update="System.Memory" Version="4.5.2" />
<PackageReference Update="System.Net.Http" Version="4.3.4" />
<PackageReference Update="System.Net.WebSockets.Client" Version="4.3.2" />
<PackageReference Update="System.Numerics.Vectors" Version="4.5.0" />
Expand All @@ -95,12 +93,19 @@
<PackageReference Update="System.Security.Cryptography.Cng" Version="4.3.0" />
<PackageReference Update="System.Security.Cryptography.Primitives" Version="4.3.0" />
<PackageReference Update="System.Security.Cryptography.X509Certificates" Version="4.3.2" />
<PackageReference Update="System.Text.Json" Version="4.6.0-preview6.19259.10" />
<PackageReference Update="System.Threading.Tasks.Extensions" Version="4.5.2" />
<PackageReference Update="System.ValueTuple" Version="4.5.0" />
<PackageReference Update="WindowsAzure.ServiceBus" Version="5.1.0" />
<PackageReference Update="WindowsAzure.Storage" Version="9.3.3" />
<PackageReference Update="xunit.runner.visualstudio" Version="2.4.1" />
<PackageReference Update="xunit" Version="2.4.1" />
</ItemGroup>
<!-- Track 2 specific versions -->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a conditional section for Track 2 libraries because I'd like to update some package versions that might be used in track 1 clients too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that DiagnosticsSource and Unsafe are the only 2 packages that are shared? I'm OK with keeping these separate at least for now while we are consuming preview versions of those packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those were the ones I was worried about.

<ItemGroup Condition="'$(IsClientLibrary)' == 'true'">
<PackageReference Update="System.Memory" Version="4.5.3" />
<PackageReference Update="Microsoft.Bcl.AsyncInterfaces" Version="1.0.0-preview6.19259.10" />
<PackageReference Update="System.Text.Json" Version="4.6.0-preview6.19259.10" />
<PackageReference Update="System.Diagnostics.DiagnosticSource" Version="4.6.0-preview6.19259.10" />
<PackageReference Update="System.Runtime.CompilerServices.Unsafe" Version="4.6.0-preview6.19259.10" />
</ItemGroup>
</Project>

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions sdk/core/Azure.Core/src/Azure.Core.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
<ItemGroup>
<PackageReference Include="Microsoft.Bcl.AsyncInterfaces" />
<PackageReference Include="System.Buffers" />
<PackageReference Include="System.Diagnostics.DiagnosticSource" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we need to use DiagSource 4.6.0 preview, it uses W3C trace-context and we should never use old Id format in Azure SDKs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the format supposed to be controlled by the app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<PackageReference Include="System.Memory" />
<PackageReference Include="System.Numerics.Vectors" />
<PackageReference Include="System.Runtime.CompilerServices.Unsafe" />
Expand Down
86 changes: 86 additions & 0 deletions sdk/core/Azure.Core/src/Pipeline/AzureOperationScope.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

using System;
using System.Diagnostics;

namespace Azure.Core.Pipeline
{
public struct DiagnosticScope: IDisposable
{
private Activity _activity;

private readonly string _name;

private readonly DiagnosticListener _source;

internal DiagnosticScope(string name, DiagnosticListener source)
{
_name = name;
_source = source;
_activity = _source.IsEnabled() ? new Activity(_name) : null;
}

public bool IsEnabled => _activity != null;

public void AddAttribute(string name, string value)
{
_activity?.AddTag(name, value);
}

public void AddAttribute<T>(string name, T value)
{
if (_activity != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid doing this. Activity tags are always strings. ToString() can be called by caller and it's better to make it clear from API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to avoid formatting to string when there is no diagnostic source attached.

Copy link
Member

@lmolkova lmolkova Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a special Activity.Recorded flag now - it should be false if there is no listener and if you badly need tags - you should add them only if activity is recorded (i.e. sampled in). OpenTelemetry should take care of setting it to true when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a point in creating activity at all if it's not being recorded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for propagation: you don't record it, but service before you and after you did - they can correlate telemetry (but causation is broken)

{
AddAttribute(name, value.ToString());
}
}

public void AddAttribute<T>(string name, T value, Func<T, string> format)
pakrym marked this conversation as resolved.
Show resolved Hide resolved
{
if (_activity != null)
{
AddAttribute(name, format(value));
}
}

public void Start()
{
if (_activity != null && _source.IsEnabled(_name))
{
_source.StartActivity(_activity, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Http requests we should pass payload. It is more efficient to pass it (and parse it) then create multiple attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for http requests, this activity is for client level operations.

}
}

public void Dispose()
{
if (_activity == null)
{
return;
}

if (_source != null)
{
_source.StopActivity(_activity, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we should have payload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't always have payload at this level and we won't be able to have an adapter library per client SDK anyway.

}
else
{
_activity?.Stop();
}
}

public void Failed(Exception e)
{
if (_activity == null)
{
return;
}

_source?.Write(_activity.OperationName + ".Exception", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely it will be ignored by tracing system as it is likely handled and just noise. Unhandled exceptions are taken care of by web framework, So I think we don't need it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we care about setting a status flag on spans?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do, can we do it better than Exception instance? If not, that's ok, but we need a plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but using an exception should be an option too.

_activity?.Stop();

_activity = null;

}
}
}
6 changes: 5 additions & 1 deletion sdk/core/Azure.Core/src/Pipeline/HttpPipeline.cs
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ public class HttpPipeline
private readonly ResponseClassifier _responseClassifier;
private readonly ReadOnlyMemory<HttpPipelinePolicy> _pipeline;

public HttpPipeline(HttpPipelineTransport transport, HttpPipelinePolicy[] policies = null, ResponseClassifier responseClassifier = null)
public HttpPipeline(HttpPipelineTransport transport, HttpPipelinePolicy[] policies = null, ResponseClassifier responseClassifier = null, ClientDiagnostics clientDiagnostics = null)
{
_transport = transport ?? throw new ArgumentNullException(nameof(transport));
_responseClassifier = responseClassifier ?? new ResponseClassifier();

Diagnostics = clientDiagnostics ?? new ClientDiagnostics(true);

policies = policies ?? Array.Empty<HttpPipelinePolicy>();

var all = new HttpPipelinePolicy[policies.Length + 1];
Expand All @@ -32,6 +34,8 @@ public HttpPipeline(HttpPipelineTransport transport, HttpPipelinePolicy[] polici
public Request CreateRequest()
=> _transport.CreateRequest();

public ClientDiagnostics Diagnostics { get; }

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public async Task<Response> SendRequestAsync(Request request, CancellationToken cancellationToken)
{
Expand Down
4 changes: 3 additions & 1 deletion sdk/core/Azure.Core/src/Pipeline/HttpPipelineBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@ public static HttpPipeline Build(ClientOptions options, bool bufferResponse = tr
policies.Add(BufferResponsePolicy.Shared);
}

policies.Add(new RequestActivityPolicy());

policies.RemoveAll(policy => policy == null);

return new HttpPipeline(options.Transport, policies.ToArray(), options.ResponseClassifier);
return new HttpPipeline(options.Transport, policies.ToArray(), options.ResponseClassifier, new ClientDiagnostics(options.Diagnostics.IsLoggingEnabled));
}

// internal for testing
Expand Down
28 changes: 28 additions & 0 deletions sdk/core/Azure.Core/src/Pipeline/HttpPipelineDiagnostics.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

using System.Diagnostics;

namespace Azure.Core.Pipeline
{
public sealed class ClientDiagnostics
{
private readonly bool _isActivityEnabled;

public ClientDiagnostics(bool isActivityEnabled)
{
_isActivityEnabled = isActivityEnabled;
}

private static readonly DiagnosticListener s_source = new DiagnosticListener("Azure.Clients");

public DiagnosticScope CreateScope(string name)
{
if (!_isActivityEnabled)
{
return default;
}
return new DiagnosticScope(name, s_source);
}
}
}
117 changes: 117 additions & 0 deletions sdk/core/Azure.Core/src/Pipeline/Policies/RequestActivityPolicy.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

using System;
using System.Diagnostics;
using System.Globalization;
using System.Threading.Tasks;

namespace Azure.Core.Pipeline.Policies
{
internal class RequestActivityPolicy : HttpPipelinePolicy
{
private const string TraceParentHeaderName = "traceparent";
private const string TraceStateHeaderName = "tracestate";
private const string RequestIdHeaderName = "Request-Id";

public static RequestActivityPolicy Shared { get; } = new RequestActivityPolicy();

private static readonly DiagnosticListener s_diagnosticSource = new DiagnosticListener("Azure.Pipeline");

public override Task ProcessAsync(HttpPipelineMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline)
{
return ProcessAsync(message, pipeline, true);
}

public override void Process(HttpPipelineMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline)
{
ProcessAsync(message, pipeline, false).EnsureCompleted();
}

private static async Task ProcessAsync(HttpPipelineMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline, bool isAsync)
{
if (!s_diagnosticSource.IsEnabled())
{
await ProcessNextAsync(message, pipeline, isAsync).ConfigureAwait(false);

return;
}

var activity = new Activity("Azure.Core.Http.Request");
activity.AddTag("http.method", message.Request.Method.Method);
Copy link
Member

@lmolkova lmolkova Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest passing HttpRequest and response objects - and avoid using tags for http pipeline.

If you have a strong need to use them to avoid more work in bridge, check if activity is sampled in first, before adding them.

We should talk more about how to support it better with Activity.Recorded which comes with next DiagnosticSource 4.6.0-preview), it is propagated from parent and it could be changed in Start events by the listener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some friction with updating to DiagnosticSource 4.6.0-preview because of the central version management in the repo. I'll file an issue to follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a strong need to use them to avoid more work in bridge, check if activity is sampled in first, before adding them.

Isn't checking for s_diagnosticSource.IsEnabled(activity.OperationName, message.Request) enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if IsEnabled is false, then Activity is not created at all - nothing gets tracked or propagated. Activity.Recorded = false means activity is created, and propagated, but augmenting it with anything does not make sense.

You see why we need OpenTelemetry in general - there are some many caveats around Activity...

Copy link
Contributor Author

@pakrym pakrym Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activity is created, and propagated, but augmenting it with anything does not make sense.

What is the point of this activity then? It carries no information except ID and operation name. Especially considering AzureSDK does not need to create and incoming request activity like ASP.NET COre does

activity.AddTag("http.url", message.Request.UriBuilder.ToString());
if (message.Request.Headers.TryGetValue("User-Agent", out string userAgent))
{
activity.AddTag("http.user_agent", userAgent);
}

var diagnosticSourceActivityEnabled = s_diagnosticSource.IsEnabled(activity.OperationName, message);

if (diagnosticSourceActivityEnabled)
{
s_diagnosticSource.StartActivity(activity, message);
}
else
{
activity.Start();
}

pakrym marked this conversation as resolved.
Show resolved Hide resolved

if (isAsync)
{
await ProcessNextAsync(message, pipeline).ConfigureAwait(false);
}
else
{
ProcessNext(message, pipeline);
}

activity.AddTag("http.status_code", message.Response.Status.ToString(CultureInfo.InvariantCulture));

if (diagnosticSourceActivityEnabled)
{
s_diagnosticSource.StopActivity(activity, message);
}
else
{
activity.Stop();
}
}

private static async Task ProcessNextAsync(HttpPipelineMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline, bool isAsync)
{
var currentActivity = Activity.Current;

if (currentActivity != null)
{
if (currentActivity.IdFormat == ActivityIdFormat.W3C)
{
if (!message.Request.Headers.Contains(TraceParentHeaderName))
{
message.Request.Headers.Add(TraceParentHeaderName, currentActivity.Id);
if (currentActivity.TraceStateString != null)
{
message.Request.Headers.Add(TraceStateHeaderName, currentActivity.TraceStateString);
}
}
}
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think supporting Request-Id doesn't make sense. Azure Services will never implement Request-Id header so propagating it won't help anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should we do in case of non-W3C activity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options

  1. Insist on W3C activity (in preview7 there will be an instance method Activity.SetIdFormat()). Then if a user had activity in the old format, the one created by Azure SDK will not correlate to the old one. But anything that happens in and after Azure SDK (e.g. EventHub and message processing or Storage logs and EventGrid notifications followed by Function) will be correlated.

  2. Tolerate old format but not inject headers. Then everything before and including Azure SDK will correlate, anything after will become a new trace.

Looking into this, there is no good solution for format mismatch. But I think this is not Azure SDK responsibility to solve this issue - tracing system should control the format and enforce W3C on the incoming boundary.

In the bright future, there will be nothing else than W3C and we won't have this problem.

So I suggest do 2.

{
if (!message.Request.Headers.Contains(RequestIdHeaderName))
{
message.Request.Headers.Add(RequestIdHeaderName, currentActivity.Id);
}
}
}

if (isAsync)
{
await ProcessNextAsync(message, pipeline).ConfigureAwait(false);
}
else
{
ProcessNext(message, pipeline);
}
}
}
}
Loading