Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for tracestate collection in Activity of DiagnosticSource #26463

Open
SergeyKanzhelev opened this issue Jun 12, 2018 · 12 comments
Open
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Diagnostics.Tracing feature-request tracking This issue is tracking the completion of other related issues.
Milestone

Comments

@SergeyKanzhelev
Copy link

Motivation

W3C spec for distributed tracing context defines a new concept of tracestate. In order to implement W3C specification this concept should be implemented in Activity class.

Tracestate collection should allow to make multiple types of modifications:
- remove name/value pair.
- add or modify name/value pair that will be applied to every child Activity.
- add or modify name/value pair for a certain scope. For instance - just before making an outgoing call to dependent service. This modification should only be applied to this scope, not to the parent activities.

CC: @seth-capistron, @vancem, @jacpull, @lmolkova, @glennc, @davidfowl

Proposed API

namespace System.Diagnostics
{
    public class Activity
    {
        ...
        // <summary>
        //  List of key/value pairs representing the tracestate field of distributed tracing context.
        //  https://github.com/w3c/distributed-tracing/blob/master/trace_context/HTTP_HEADER_FORMAT.md#tracestate-field
        //  TraceState is a list of named opaque values. Both key and value has a set of limitation imposed by W3C specification.
        //      key:  
        //          Identifiers are short (up to 256 characters) textual identifiers. Note that identifiers 
        //          MUST begin with a lowercase letter, and can only contain lowercase letters a-z, digits 0-9, 
        //          underscores _, dashes -, asterisks *, and forward slashes /.
        //      value: 
        //          Opaque string up to 256 characters printable ASCII RFC0020 characters (i.e., the range 0x20 to 0x7E) 
        //          except comma , and =. Note that this also excludes tabs, newlines, carriage returns, etc.
        //  There is also a limit on overall length of the TraceState collection. For complete specification refer to W3C document.
        //
        //  Setting of key/value pair with incorrect key or value will result in ArgumentException.
        // </summary>
        public SortedDictionary<KeyValuePair<string, string>> TraceState { get; }
        ...
    }
}

Notes

  • Proposed API will throw ArgumentException for incorrectly formatted values. This may be unexpected as collection is defined as string/string collection. However throwing exception is better alternative to defining custom class.
  • Mutation of entries requires shift of values in the list. Most recently accessed key/value pairs should be moved to the begginning of the list. This should be noted by implementors. This may also cause confusion as .Add method will add value to the beggining of the list, not to the end as one may expect.
  • W3C spec is in draft form and may change. This implementation may be an input for the change of W3C spec.

Usage

Read tracestate key's value

This example shows how the value of sampling-score which is double can be read from TraceState.

var samplingScoreStr = Activity.Current.TraceState["sampling-score"];
var samplingScore = string.IsNullOrEmpty(samplingScoreStr) ? 1 : Convert.ToDouble(samplingScoreStr);
if (samplingScore > sampling)
{
    // Do not track telemetry.
}

Change of tracestate should be reflected in every child activity of this activity. Supported mutations of trace state:

Update key value

Activity.Currnet.TraceState["sampling-score"] = (0.23).ToString();

Add new key-value pair

Any other values with the same name should be removed from the list.

Activity.Currnet.TraceState["sampling-score"] = (0.23).ToString();

Delete the key-value pair

Activity.Currnet.TraceState.Remove("sampling-score");
@SergeyKanzhelev
Copy link
Author

Some additional thoughts on proposed API after discussion with @adriancole:

  • most common operation is expected to be take 0 element. This is because in many distributed apps single instrumentation library is used across the app
  • number of elements will always be small. So perhaps IList or IEnumerable may suffice instead of random access dictionary that will take more space and memory
  • also with list other elements can be lazily parsed on demand
  • however any lazy parsing will require either pluggable implementation or bake in a few serialization formats into Activity class itself

I will update Usage section to indicate common vs. uncommon scenarios. Please comment if you have feedback

@SergeyKanzhelev
Copy link
Author

Comment from @lmolkova before - APIs of Activity do not throw generally. Altermative is to swallow exception or clean up data (e.g. lowercase the key). @karelz any guidance here?

@karelz
Copy link
Member

karelz commented Jun 14, 2018

I let area experts - @vancem @brianrob to chime in. IMO we should be resilient to invalid data. Ignoring them or cleaning them up sounds reasonable.

@vancem
Copy link
Contributor

vancem commented Jun 14, 2018

I would like to separate the 'HTTP-like stuff' from the Activity class. Its job is to be a container that propagates things in a way useful for causality tracining. Stuff associated with HTTP conventions for encoding and parsing stuff does not really belong here. Also the W3C rules for propagating this value don't belong in activity.

My knee-jerk reaction is that 'tracestate' should be just a piece of baggage (That is the whole string that encodes all the pieces just gets put into the 'traceState'). To do this well I think it is useful to have a UTF8String version of the baggage APIs (so you don't have to convert your UTF8 to strings which is nice), and I am also open to a 'Remove' API for baggage if we need it.

Effectively 'traceState' just becomes one more piece of baggage, which you already have to enumerate tu support 'correlation-context'. Thus you just need to watch for 'traceState' as you process that and encode/process it specially when you see it.

I do think having UTF8String (bascially Span) versions of the Baggage APIs makes sense so as to avoid lots of allocation / parsing.

@SergeyKanzhelev
Copy link
Author

@vancem tracestate is not a collection of tags associated with distributed trace. It is a current position in a distributed graph of a specific vendor. It may be updated on every request to indicate the "last seen span-id" that belong to the current vendor. There is an explanation in section 2.1 of https://w3c.github.io/distributed-tracing/report-trace-context.html

Also the concept is not http specific. Unless your comment was about lazy parsing. Than I agree that base Activity class may not be the best place for it.

That's said the idea to put it in Correlation Context will require some extra parsing and memory allocations. Especially if one will need to update the value in tracestate for every outgoing call. Which is the case for any hierarchical state implementation.

Does it make sense with this semantics of tracestate explanation to create a dedicated collection? Or you feel strongly about placing it into Correlation Context collection?

@vancem
Copy link
Contributor

vancem commented Jun 14, 2018

@SergeyKanzhelev. Yes I know that tracestate is a list of IDs that represent transition points. But it is also just a string that needs to be propagated through user-code so that when other operations (e.g. HTTP sends) happen, knowledgeable code can update the state and send it out.

In that sense FROM ACTIVITY's point of view, tracestate is just a piece of baggage (something it remembers so that other things can use it. The only thing that Activity has to do is remember it (and propagate it to all its children).

Yes, something needs to update tracestate. I am just saying that we have the choice of what that is, and arguably all that logic ideally would be together. Thus I am arguing that logic that hooks request can also do the update and update the 'tracestate' key of the baggage to be the new string that will be written to output requests. Activity's job is simply to remember it.

Note that Activity's baggage is actually NOT a dictionary, and in fact, my suggestion is we make it so that you pass in Span for the key and values as UTF8 blobs, and Activity pretty much simply saves a big blob of concatenated blobs.

Also note that hopefully tracestate is in fact empty most of the time and tends not to be modified (this is the single vendor case). It seems like other pieces of the correlation context might be at least as likely to be updates as this piece (is there is not a strong perf reason to treat it specially, we need to make fiddling with the correlation context reasonably efficient)

I am not trying to be difficult, but I do think the devil is in the details, and we are talking about very hot code paths (anything that is per-request or faster needs careful consideration). This means we want to be super lazy about parsing (especially if common scenario may be pass through, or a simple morph that could be done without exploding the data into some data structure (like a SortedDictionary).

I am OK with having a string or UTF8String blob tracestate property on Activity (in general I am OK with 'well known' pieces of baggage, since it encourages type safety, discoverability. Do you want to do that?

I think the more interesting/probelmatic issue is what we do about IDs.

@SergeyKanzhelev
Copy link
Author

SergeyKanzhelev commented Jun 15, 2018

@vancem I'll need to think about it and prototype a little to understand the most common patterns. Two concerns I have immediately are:

  1. With unparsed tracestate is that it moves knowledge of format from technology that recieved that field in one form or another (http vs. grpc vs. amqp...) to consumer of that information that ideally should be protocol-neutral. We need to understand whether tracestate is compresseable at all or will have the same format in every protocol.
  2. We discussed that there potentially will be no truly generic tracers. And every tracer may just copy current traceparent to tracestate under it's name. So tracestate is never empty. However talking to people implementing platforms (for instance azure services) I see a strong desire to avoid doing it. Mostly because they don't know which vendor will consume traces in the end of the day. Actual behavior here will contribute into what's typical pattern and what's not.

I like the idea to keep tracestate in correlation context for now as a workaround. @lmolkova it may be a good idea to work you are doing in Azure Functions. At least for the short term.

I am not trying to be difficult

I'm not trying to be pushy and see where you are coming from =)

@codefromthecrypt
Copy link

codefromthecrypt commented Jun 16, 2018 via email

@cwe1ss
Copy link
Contributor

cwe1ss commented Jun 21, 2018

I'm with @vancem on this one. We have to be super careful with what gets added to Activity. Activity's main purpose is code "instrumentation" and that code shouldn't have to know about tracer-specific stuff like propagation headers etc.

If baggage is not sufficient for this, we must make sure that this has a general reason of existence - independent of any specification. The W3C spec is not yet finalized and even if it will be, it's "just one spec". It might win, it might not... There might be a new spec (or a new version of this spec) in 5 years.

There already are the proprietary Request-Id and Correlation-Context headers which have been hard-coded into the system. This mistake should not happen again. If the .NET framework decides to implement a specification, it should give people a choice to choose whichever specification they want.

@codefromthecrypt
Copy link

codefromthecrypt commented Jun 26, 2018

Baggage has been a liability in practice due to its lack of definition (like ttl), bloat, likelihood of leaking secrets, or abstraction breaking (ex encouraging business code to use tracing apis to do business functionality). It is problems with OT baggage (including entry control, privacy etc) which led to some of the delays in correlation context, ex how to make sure these problems aren't re-introduced under a new name. Many have mentioned baggage was a bit half-baked, if we are talking about OT baggage as opposed to the original.

TLDR I would steer very clear from having request scoping behavior being in any way tainted with OT behavior which in practice is propagate anything forever downstream via headers.

@cwe1ss
Copy link
Contributor

cwe1ss commented Jun 26, 2018

.NET's Activity type is roughly based on OT's Span but it definitely isn't an OT implementation.

I had another look at the W3C spec document. Seems like what we currently have with Baggage more closely relates to W3C's Correlation-Context, right?

But I'm now confused by @SergeyKanzhelev example in this issue where he used sampling-rate as a key. Shouldn't that also be a Correlation-Context value? The W3C spec seems to only talk about vendor-specific span ids in its definition of the tracestate header.

@mconnew
Copy link
Member

mconnew commented Nov 2, 2018

I know I'm coming late to this game, but I have some questions, some specifically about how this would relate to WCF. The W3C spec only talks about how this pertains to HTTP, but in the discussion there's talk about grpc and amqp. WCF also has it's own protocols so anything that is done has to be transport agnostic.
The W3C spec also quickly breaks down and fails hard when WCF is used. The tracestate can be up to 512 bytes long, and each state item can be up to 513 bytes (256 bytes for key, = character, 256 bytes for the value). Ignoring that off by 1 bug in the spec, you are only guaranteed to be able to have a single item in the tracestate. The whole purpose of the spec is to allow correlation across different tracing technologies. WCF has it's own end to end correlation tracing which if turned on means you have two different logging systems on at the same time (DiagnosticsSource based and WCF'd own tracing). This means you must propagate two tracing states or you will break one of the tracing correlation mechanisms. You aren't guaranteed to be able to do that.
But if this is the correlation standard is the one we are going to hitch our horse to, then there should be an API to manage the data. Otherwise WCF will need to write the parsing code and truncation logic to add our state to the data. But it shouldn't be part of the Activity class, it should be a separate implementation which can handle mutating the state correctly and exposing data strongly typed and have the ability to easily and cheaply import/export the values in a way that it can be added to the baggage as an opaque value. ApplicationInsights can have extra handling to pull the value out of the baggage and put it as it's own http header if needed. Frameworks such as WCF can add it as a SOAP header which carries the baggage transparently so the data flows and if this is the industry winner, add smarter handling later to add our own state.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Mar 7, 2020
@tommcdon tommcdon added the tracking This issue is tracking the completion of other related issues. label Jul 8, 2020
@tommcdon tommcdon modified the milestones: 5.0.0, 6.0.0 Jul 8, 2020
@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 16, 2021
@tommcdon tommcdon modified the milestones: 7.0.0, 8.0.0 Jun 27, 2022
@tommcdon tommcdon removed this from the 8.0.0 milestone Jul 24, 2023
@tommcdon tommcdon added this to the Future milestone Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Diagnostics.Tracing feature-request tracking This issue is tracking the completion of other related issues.
Projects
None yet
Development

No branches or pull requests

10 participants