Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Changes to support W3C style IDs and propagation #33207

Merged
merged 42 commits into from
Feb 26, 2019
Merged

Conversation

vancem
Copy link
Contributor

@vancem vancem commented Nov 1, 2018

See https://w3c.github.io/trace-context for more on what W3C Ids are.

This is mostly for design discussion purposes. Adding NO MERGE flag until we work through the details. (we will need a design review as well, but first things first).

Also see https://github.com/aspnet/Hosting/pull/1577/files for changes to the ASP.NET hosting code to support incoming HTTP propagation of the ID and TraceState.

This implementation was meant to be pretty much the 'minimal' change from where we are today that would seem to support the W3C style IDs. It does not (yet) support some important considerations including

  1. 'Always On' capability.
  2. Perf / Sampling. To always be on, it has to be cheap.
  3. Versioning. We are likely to need to tweek it multiple times over the next year or so. Thus we need a way of allowing this without forcing everyone to update there .NET Framework.

This work includes feedback from Noah where we try to keep things super-simple and avoid 'support both' functionality if at all possible. Ideally in 5 years we only have one format (the W3C format), and everything else is only available via compatibility switches (or not supported).

You can see the changes to Activity are pretty minimal. Basically

  1. we add TraceStateString (which is really just a first class baggage item),
  2. provide a new IDFormat property as well as a STATIC (DefaultIdFormat and ForceDefaultIdFormat) which tells Activity which format to use (Use the default when you don't have information on a parent OR ForceDefaultIdFormat is set).
  3. Added TraceId. This is basically a type that provides another way of getting at the RootId property when it is a W3C ID format. This is useful because it can be more efficient than using strings (which is what RootId is). It is a 16 byte binary ID (often expressed as a hex string).
  4. Added SpanId This is a type that provides a way of getting at the 'rest' of the ID' that is not the TraceId (for W3C ID format). It is a 8 byte binary ID (often expressed as a hex string).

@noahfalk @lmolkova @glennc @SergeyKanzhelev @davidfowl @pakrym

@vancem vancem added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 1, 2018
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I suggested one API tweak to use enum instead of bool for exposing format.

@lmolkova
Copy link

lmolkova commented Nov 2, 2018

let's assume new Activity is used with old HttpClient & old Asp.Net Core which are not aware of W3C.
It's totally possible for customers that run the latest AppInsights on e.g. netcoreapp 2.0.

depending on UseW3CFormat Activity may generate W3C Id, which will be injected by HttpClient into the Request-Id header which will not be useful for anyone.

If we want to be backward compatible we can try to detect such cases on tracing system level and inject traceparent ourselves, but I'd like to avoid this level of hacks.
So I still propose to add another property with public getter: call it TraceParent or W3CId, which constructs new w3c Id. And obsolete Id and make it return |traceid.spanId. This will allow tracing systems to glue up different protocols and enable a transition story for our customers.

HttpClient in this case will send

if (currentActivity.IsWC3Id)
  request.Headers.Add(DiagnosticsHandlerLoggingStrings.TraceParentHeaderName, currentActivity.W3CId);
else
  request.Headers.Add(DiagnosticsHandlerLoggingStrings.RequestIdHeaderName, currentActivity.Id);

@noahfalk
Copy link
Member

noahfalk commented Nov 2, 2018

let's assume new Activity is used with old HttpClient & old Asp.Net Core which are not aware of W3C.

Under what situation would someone want to set Activity.UseW3CFormat = true, while at the same time continuing to use legacy software components in their services that do not support W3C?

The closest scenario I am envisioning looks like this:
a) I have a big distributed system and updating the software in it is hard to do, or not entirely under my control. Lets say it has nodes X, Y, and Z in it. Requests arrive at X, then X sends requests to Y and Z.
b) Currently this system communicates using the RequestID standard and various portions of it are old so those parts have no support for anything else.
c) But I want to make X better by sending requests to a new service W and this external service only speaks W3C
d) Its important that I can get distributed tracing across the whole thing
e) Thankfully X, Y, Z, and W all use a logging framework that is able to understand both RequestID and W3C, and it can correlate the two together as long as the IDs on both sides follow some conventions for translating.

In this case I'd still be scared to set Activity.UseW3CFormat = true in node A. In particular with HttpClient implemented as suggested that means that all my requests to Y and Z could switch from RequestID to TraceParent and by assumption Y and Z probably have old software in them that doesn't understand TraceParent. Because updating the software in the whole system will be hard, I'd want to make a very localized change that lets me send TraceParent to W, while leaving everything else untouched to minimize work and risk of regression. To accomplish this I might want to logically do something like this:

Activity oldCurrent = Activity.Current;
Activity w3cActivity = new Activity();
Activity.Current = w3cActivity.SetParentId(ConvertRequestIDToW3CID(oldCurrent.Id));
// send http to W
Activity.Current = oldCurrent; 

@vancem
Copy link
Contributor Author

vancem commented Nov 2, 2018

@lmolkova I don't think we have a scenario where you use old httpClient and new activity. Both are part of the .NET core framework and would be updated together. So I don't think your scenario happens.

But on a larger, more philosophical note, I think we should keep our versioning scenarios few and as simple as possible. Cleary the 'all new case' has to work, as well as the 'all old case'. We MAY want to consider cases where some machines are old and some are new but even that case the main compelling scenario I want there is the ability to UPGRADE nodes of the system incrementally. People can stay on the old system until the whole system is able to use the new system.

My skeptisism here is that what I have found is that if things like this are not SUPER SIMPLE, users don't end up doing them anyway (they just live without it or use the old system). I am also SUPER worried that it is fragile (we do our part, but other parts of the system do not, so it actually has no value to the end user).

Thus my mantra here is to do the EASIEST/SIMPLEST thing first (the homogenous new, and while allowing the old system to work as well with the new code (no forced update)), and only add new scenarios beyond this if they have been vetted as useful, AND do not introduce much complexity.

For what it is worth...

@lmolkova
Copy link

lmolkova commented Nov 2, 2018

Regarding old HttpClient + new DiagnosticSource. They do ship together, but the pace users update is different.

ApplicationInsights will enforce new DiagnosticSource, while it may take years before users update to .NET Core 3.0.

As a tracing system, we cannot require users to upgrade to latest .NET version. what you propose is that we detect that users do not have W3C enabled ASP.NET Core & HttpClient and turn off W3C for them.

This will de-facto block W3C adoption and will create a discrepancy between different versions.

Note that there is no way to convert Request-Id to W3C - old root Id is not compatible with new TraceId format.
So even on the tracing system side, we cannot support both formats without huge implications on the backend performance.

Copy link

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an absolute minimum of changes to support W3C it may work. But we clearly need to keep working on this. For instance, expose and allow to override span-id, make tracestate a list, measure perf an consider using specialized types for trace-id and span-id. In a spirit of moving forward can you please address feedback (mostly renaming's) before merging.

/// it is expected to be special cased (it has its own HTTP header), it is more
/// convenient/efficient if it is not lumped in with other baggage.
/// </summary>
public string TraceState

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the fact it is a string, not a list is still up with discussions, correct? Should it be marked in comments that this API is not final?

/// it is expected to be special cased (it has its own HTTP header), it is more
/// convenient/efficient if it is not lumped in with other baggage.
/// </summary>
public string TraceState

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec uses tracestate as a word. Please do not upper case State. See discussion census-instrumentation/opencensus-specs#171

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other http headers/consts that I know of in Microsoft/System.Net are written with .net naming conventions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracestate follows naming convention if considered to be a single word.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am at best ambivalent about changing the case. Sure we are consistent with the spec, but is that the important thing? Is seems more useful to allow users (who probably never read the spec) to 'guess' the correct name. I would argue that they would guess TraceState (after all it is two English words).

In the end, this is a small thing (intellisense fixes it and frankly, the number of users of this API is very small), so will not fight things one way or the other. My recommendation is to let the API review people decide. I will leave it for now.

/// to store vendor-specific information that must flow.
/// Logically it is just a kind of baggage (if flows just like baggage), but because
/// it is expected to be special cased (it has its own HTTP header), it is more
/// convenient/efficient if it is not lumped in with other baggage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this summary is helpful for the person uses this API. I'd suggest the following text:

The filed tracestate carries information supplemental to trace identity contained in traceparent. List of key value pairs carries by tracestate convey information about request position in multiple distributed tracing graphs. It is typically used by distributed tracing systems and should not be used as a general purpose baggage as this use may break correlation of a distributed trace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this to the comment

@lmolkova
Copy link

lmolkova commented Nov 5, 2018

As an absolute minimum of changes to support W3C it may work.

I do not agree. We should understand what is the default and configurable behavior for Id format. Now it's defined by upstream service.

We should allow either enforce Request-Id or W3C: deterministic behavior controlled by current service.

@SergeyKanzhelev
Copy link

@lmolkova I agree with you that we need to make it deterministic. This implementation make it an ASP.NET decision as parsing logic may decide to switch Activity to the new format when receiving Request-ID. But it will require a setting for ASP.NET which may not be desirable.

Thus I'm saying it is an OK first step that may require many more steps

@vancem
Copy link
Contributor Author

vancem commented Nov 15, 2018

I have added a strawman for Activity sampling. The idea is to make an 'always on' capability cheap enough so that we will indeed leave it always on (just turn the sampling down.

Basically on incoming HTTP request (in RecordRequestStartEventLog) we now will do something like this

            Activity activity = Activity.CreateWithWithSampling(ActivityName, requestId);
            if (activity != null)
            {
                 pull out all baggage and other state from the headers).  

Logging systems can basically issue config (requests) to have a certain sampling level (SamplingPercent). Setting this to 15 will cause on average 15 out of every 100 events to be sampled.

The activity class keeps track of all the outstanding request for sampling and basically chooses the largest. When the logging system disposes the config then that request for sampling goes away.

The sampling system tries to keep the given sampling rate as well as honor any explicit requests to sample (e.g. from a W3C id or from SetRecordingDesired.

@SergeyKanzhelev
Copy link

@vancem would it be possible to make sampling proposal a separate PR? So we can keep making progress and sampling discussing will not block possible other discussions

@vancem
Copy link
Contributor Author

vancem commented Nov 19, 2018

I have pulled the activity sampling logic into its own PR #33605.

@karelz
Copy link
Member

karelz commented Dec 19, 2018

@vancem this seems like a stale PR with accidental merge instead of rebase (which makes it nearly impossible to review). I will close it for now.
Feel free to reopen it once the commit history is fixed.

@karelz karelz closed this Dec 19, 2018
@karelz karelz added this to the 3.0 milestone Dec 21, 2018
@SergeyKanzhelev
Copy link

@karelz this is active PR. Perhaps re-opening to clean up commits is a good idea =). @vance, can you please mention us if you will decide to open a separate PR

@vancem vancem reopened this Jan 7, 2019
@vancem
Copy link
Contributor Author

vancem commented Jan 7, 2019

Reopening. I am updating the branch to fix the accidental merge.

@karelz
Copy link
Member

karelz commented Jan 8, 2019

Is the "NO MERGE" still needed? Do we have rough ETA for the PR?

@vancem
Copy link
Contributor Author

vancem commented Jan 8, 2019

Is the "NO MERGE" still needed? Do we have rough ETA for the PR?

Yes, the PR is still in a state that we would not want to check in. We want this in for 3.0, so we expect to finalize it this month.

@ericstj
Copy link
Member

ericstj commented Feb 21, 2019

I added a workaround for the package cycle. This is going to fail package testing due to the dangling ref. Let me add a workaround for that.

@vancem vancem removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 21, 2019
@vancem
Copy link
Contributor Author

vancem commented Feb 22, 2019

Reminding myself that the API review issue for this new API is https://github.com/dotnet/corefx/issues/34828


<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Condition="$(TargetFramework.StartsWith('netstandard1.')) OR $(TargetFramework.StartsWith('netcoreapp1.'))">
<PackageReference Include="System.Memory" Version="4.5.1" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pick this version over 4.5.2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fix that @joperezr suggested when @ericstj was out. I don't have a strong opinion, but in general isn't it better to use the oldest version that works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a temporary workaround in the package testing infra. It's nothing that persists into the product. You can tell folks to manually reference the latest.

@@ -38,6 +38,9 @@
<PackageReference Include="$(TargetingPackNugetPackageId)">
<Version >$(_TargetingPackVersion)</Version>
</PackageReference>
<PackageReference Include="System.Memory" Condition="'$(TargetGroup)' == 'net45' OR '$(TargetGroup)' == 'net46'">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other netfx versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a question for @joperezr or @ericstj.

@@ -2,11 +2,8 @@
<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<BuildConfigurations>
netstandard1.1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this test configurations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got failures trying to build these test configurations because of the way we build with xunit. After some discussion with @ericstj we concluded that these configurations did not make sense (it did not increase coverage over the other configurations that are still present). I can forward the e-mail conversation on it if you wish.

Uses Utf8 helpers where possible.
@vancem
Copy link
Contributor Author

vancem commented Feb 25, 2019

OK, unless someone vetos it, I am merging this later today. I have incorperated @ahsonkhan feedback. I do want to get packages to the App-Insights folks.

@ahsonkhan
Copy link
Member

OK, unless someone vetos it, I am merging this later today

Question: Would we address API review feedback (if any) in a subsequent commit? I imagine people having opinions about the isUtf8Chars and overloading ReadOnlySpan<byte> to mean both binary and utf-8 text in this API which is something we don't have precedent for:

public ActivitySpanId(System.ReadOnlySpan<byte> idData, bool isUtf8Chars = false) { throw null; }

if (idData.Length != 16)
throw new ArgumentOutOfRangeException(nameof(idData));
idData.CopyTo(outBuff);
_id1 = BinaryPrimitives.ReverseEndianness(_id1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% confident about reversing the endianness here, mainly because I am not sure why we get the bytes as big endian which ends up requiring this reversal. I would have thought we would want: if (!BitConverter.IsLittleEndian) { reverse }, i.e. reverse only for BigEndian. Just highlighting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want it the way it currently is. Basically the standard wants the bytes written in byte-by-byte order, which is 'standard' for network stuff. This happens to correspond to big-endian if you try to clump it as larger integers as we are doing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not get any feedback on the API review on this API. We can change it, but the sooner the better, as it is a breaking change. The alternative is to pick one to use the span constructor and the other is a static (or make both methods that take strings (as Spans) to be static). If there is precedent for resolving this I am happy to follow it.

/// and returns the resulting string.
/// </summary>
internal static string SpanToHexString(ReadOnlySpan<byte> bytes)
{
Debug.Assert(bytes.Length <= 16); // We want it to not be very bing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: very bing

@vancem vancem merged commit fa07e4f into dotnet:master Feb 26, 2019
@davidfowl
Copy link
Member

Nice!

macrogreg pushed a commit to open-telemetry/opentelemetry-dotnet-instrumentation that referenced this pull request Sep 24, 2020
* First set of changes to support W3C style IDs and propagation

see https://w3c.github.io/trace-context

This is mostly for discussion purposes.

* Changed UseW3CFormat to be DefaultIdFormat

* Review feedback

* More comments.  Small Renames

* Added Sampling Support, Review feedback.

* Added placeholder for SetRecordingDesired

* Separated out the Sampling support into its own PR.

* Remove more sampling support

* Fix IsWC3Id -> IdFormat

* Added Sampling Support, Review feedback.

* Separated out the Sampling support into its own PR.

* Add ForceW3C option.

* Introduce ForceDefaultIdFormat

* Adding SpanId and TraceId support

* Change ulong->long in SpanID (probably temporary)

* Remove undesired file changes

* Fix up reference assembly.

Note complete because of questions about Span<byte>, but closer now.

* Turn on code that used Span<byte>

* Defer setting IDFormat until start.

This insures that the IDFormat property is either unknown or a given value (that never changes from there).

* More implementation, made the interface more uniform.

* Support to avoid using strings whenever possible

Basically Id, SpanId TraceId properties are set lazily and only converted lazily.

* Added some tests

* First round of testing (and bugfixes)

* More testing

* Added Equality operators

* Rename SpanId -> ActivitySpanId TraceId ->ActivityTraceId

* Add Comments

* Fix bad XML comment

* Review feedback

* Change AsBytes -> CopyTo

Lifetime issues prevent returning Span<T> (which is what AsBytes does.   Reverting to CopyTo instead.
Also added System.Memory ref in attempt to resolve build errors (that don't reproduce locally).

* Attempt to fix build break in netfx build

* Deal with Verification errors.

* Provide full namespace for SecuritySafeCriticalAttribute

* More securitySafeCritical annotations to fix test failures on desktop

* Another attempt on setting SecuritySafeCritical

* Remove the readonly ref (to avoid perf issue)

* Fix most  -buildAllConfigurations issues

* Tentative change to see if testing works if we ignore the older netstandard configs.

* Workaround package cycle involving DiagnosticSource and Memory

* Satisfy dangling System.Memory dangling reference in package tests

* Add notes to remove the workarounds when Unsafe is fixed.

* Review feedback

Uses Utf8 helpers where possible.


Commit migrated from dotnet/corefx@fa07e4f
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* First set of changes to support W3C style IDs and propagation

see https://w3c.github.io/trace-context

This is mostly for discussion purposes.

* Changed UseW3CFormat to be DefaultIdFormat

* Review feedback

* More comments.  Small Renames

* Added Sampling Support, Review feedback.

* Added placeholder for SetRecordingDesired

* Separated out the Sampling support into its own PR.

* Remove more sampling support

* Fix IsWC3Id -> IdFormat

* Added Sampling Support, Review feedback.

* Separated out the Sampling support into its own PR.

* Add ForceW3C option.

* Introduce ForceDefaultIdFormat

* Adding SpanId and TraceId support

* Change ulong->long in SpanID (probably temporary)

* Remove undesired file changes

* Fix up reference assembly.

Note complete because of questions about Span<byte>, but closer now.

* Turn on code that used Span<byte>

* Defer setting IDFormat until start.

This insures that the IDFormat property is either unknown or a given value (that never changes from there).

* More implementation, made the interface more uniform.

* Support to avoid using strings whenever possible

Basically Id, SpanId TraceId properties are set lazily and only converted lazily.

* Added some tests

* First round of testing (and bugfixes)

* More testing

* Added Equality operators

* Rename SpanId -> ActivitySpanId TraceId ->ActivityTraceId

* Add Comments

* Fix bad XML comment

* Review feedback

* Change AsBytes -> CopyTo

Lifetime issues prevent returning Span<T> (which is what AsBytes does.   Reverting to CopyTo instead.
Also added System.Memory ref in attempt to resolve build errors (that don't reproduce locally).

* Attempt to fix build break in netfx build

* Deal with Verification errors.

* Provide full namespace for SecuritySafeCriticalAttribute

* More securitySafeCritical annotations to fix test failures on desktop

* Another attempt on setting SecuritySafeCritical

* Remove the readonly ref (to avoid perf issue)

* Fix most  -buildAllConfigurations issues

* Tentative change to see if testing works if we ignore the older netstandard configs.

* Workaround package cycle involving DiagnosticSource and Memory

* Satisfy dangling System.Memory dangling reference in package tests

* Add notes to remove the workarounds when Unsafe is fixed.

* Review feedback

Uses Utf8 helpers where possible.


Commit migrated from dotnet/corefx@fa07e4f
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants