-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support customized Activity trace id generation #46704
Comments
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti Issue DetailsBackground and MotivationThe .NET runtime provides APIs to generate Activity, which can be adopted and processed as trace context and eventually emitted to multiple backend vendors by OpenTelemetry .NET SDK for the purpose of tracing. Currently .NET runtime only allows an activity's trace id to be randomly generated and there is no public API available to override it. While for some backend vendors, trace context with a non-random customized trace id is more preferred. For instance, AWS X-Ray backend will only process trace context with a trace id containing a timestamp. In this case, is it possible for the runtime to support customized activity trace id, like adding API similar as Activity.SetTraceId()? Previous discussion: open-telemetry/opentelemetry-specification#896 Proposed APIpseudocode
Usage ExamplesAlternative DesignsRisks
|
@lupengamzn I don't think overriding the trace id is a good idea here. if you do so, it can break other consumers of the Activity objects which used the original trace id before overriding it. Why you can't create the Activity object passing the ActivityContext with the desired trace id? CC @noahfalk |
I think the ask is not to override traceid. The ask to control trace_id generation algorithm. Its currently random, but some backends like AWS, requires the traceid to be in a particular format. So essentially asking for a 'hook', which will be called whenever Activity is ever going to generate a TraceId, and use that, instead of currently randmonid generation. |
The proposed API above isn't in the form of a hook? (I think hook == callback or event) @lupengamzn's design assumes that there are certain Activities the developer will create directly and only those Activities will have the customized trace ids. The API above feels simpler to me relative to a hook because there is no ambiguity about who has control of the ID - whatever code creates the Activity has the control. In the case of hooks we get issues such as what if two different components try to set the hook and they don't agree about which ID to create. Its pretty awkward if multiple components are fighting over which of them gets to control singleton global state. Thanks @lupengamzn! I can somewhat see how this scenario would work but I am still trying to fit together the pieces : ) Don't take this as me pushing back, I just want to ensure that if we build something its really going to solve the problems that it needs to solve.
|
Discussion moved from the #46706:
@lupengamzn I think this is reasonable. The only concern is this can affect the current value we return from Activity.ParentId which I am not sure if this will be high risk for the app compatibility. Could you please elaborate more on why you need this change? what scenario you think current behavior would be problematic? CC @noahfalk
Hey @tarekgh , Thanks for the quick response! We're planing to override the trace id of a root activity which is created by .NET runtime and we found out that there is no public API for us to do so. The only API available for us to touch the trace id is through SetParentId() so we use it and pass new trace id, default parent span id, and the original trace flags of activity hoping to get an activity with the new trace id, but parent span id and original trace flags the same as previous. We found out that, if we pass default as parent span id to this API, the activity's parent span id is no longer null, but overridden to 0000000000000000 instead, which will give us 00-new trace id-0000000000000000-00 when we call activity.ParentId, instead of null as we expected. We're thinking if it's more suitable for this API to keep the parent span id as null when we pass default as the parent span id as the second parameter. After that, when we call this API again and only passing the new trace id and keep the second and third parameters the same as before, only the trace id of activity will be overridden and the activity.ParentId will return null still.
@lupengamzn thanks for clarifying that. @noahfalk what you think about this change? I think the app compat would be minimal and the change looks desired.
I'd like to find out why the broader scenario needs the dev to have direct control of the trace id for the Activity? I'm only aware of two typical scenarios for activities and neither of them needs that ability: root -> create a new random id Fwiw if we do this I think we should consider adding a new SetTraceId() API rather than modifying SetParentId(). It feels counter-intuitive to me that after calling SetParentId(x,y) it might still be the case that ParentId == null. I get that is the goal in this case, but I don't think it is what anyone else looking at this API would expect to have happen. Anyone else who wanted the same behavior would probably not be able to discover how to do it.
Hey @noahfalk , Thanks for the response! Changing the trace id of a root activity by calling SetParentId() turns out counter-intuitive for us too, so adding a new API such as SetTraceId() would be the most ideal way as currently there seems no direct APIs for us to touch the trace id of an activity. We're working with OTel .NET SDK team and as OTel .NET SDK adopts trace context from .NET runtime, it would be great that you can provide OTel .NET developers an API to override the original random trace id as other OTel SDKs already support such feature. The use case of such API so far is to override the original trace id with a non-random one for a root activity so that it (and its child activities) can be further recognized and processed by specific backend vendor (like AWS X-Ray). I opened an issue discussing this and the workaround on it would be much appreciated if it's decided to target .NET 6.
@lupengamzn do you control the root activity creation? I mean how to decide which Activity is the root activity? Can we close this issue and continue the discussion in #46704 to not have teared discussion?
Hey @tarekgh , We're using activity listener to detect every activity and we're able to know if it's a root activity by checking if it's parentId is null or not. If it's root activity, we're calling SetParentId() to it to override the trace id. It seems a little bit tricky but is so far the only way we're able to come up with based on the APIs we can utilize from activity class. |
Ah I didn't realize you were intending to call this from the callback inside ActivityListener. I think you are relying on unintended behavior - the SetParentId() function is missing a check whether the Activity is already started. The activity's ID and parent are both intended to be immutable at that point. For OpenTelemetry the requirement is a bit stronger because the ID is used for sampling decisions before the Activity is started. If we gave the sampler id X and it makes it decision based on that we don't want to transmute the activity to have new ID Y because that might invalidate the sampling decision. We can look at designs that set the ID earlier to avoid these kinds of issues, it just probably wouldn't be the API above any longer (at least not for the ActivitySource.StartActivity() case). It also sounds like your goal is to control the IDs of all Activities generated within the app, not just the ones that your code is directly responsible for creating? Datadog asked for something similar and I initially rebuffed them, but now that we've got multiple companies asking I feel like it at least deserves a 2nd look. I'm still trying to figure out what happens when an external trace ID comes in on the wire and it doesn't follow the ID generation rules your system knows how to handle. Do you fail to log it or you hope to convert it using some extensibility mechanism for intercepting inbound IDs? |
Hey @noahfalk , Sorry for the late reply and thank you so much for the explanation. 😃 For your questions:
If using X-Ray SDK, AWS X-Ray backend will abandon all trace context which doesn't have a compatible trace id and log them out. If using OTel SDK, X-Ray collector will discard them and log them out.
We plan to provide APIs which can generate X-Ray compatible trace id that app developers can use to pass in the API provided by .NET runtime to eventually override the root activity's trace id. And developers will need to have such instrumentation in their application so as to get trace context recognized by X-Ray backend (if they would like to use X-Ray). Actually X-Ray's trace id is in similar format as activity's trace id, they're both 32 character hex string, except that X-Ray requires the first 8 character to be the epoch time and the remaining 24 character to be random. That being said, X-Ray's trace id should be compatible for other backend vendors as long as they're compatible with activity's trace id. In this case, instrumenting this API will not affect the behavior when developers are also using other tracing services for their application at the same time. In fact, other OTel SDKs have already provided such API to achieve this, to which developer can pass X-Ray compatiable trace id to override the root's trace id, so we're contacting with OTel .NET team to discuss about this feature. And since OTel .NET SDK adopts trace context from .NET runtime, we may need your assistance to realize this as well. 😃
We realized this behavior that sampling decision is made before an activity is created while using
Yes, trace id needs to be uniform in format as X-Ray will trace the entire application and all its downstream services, which could either be AWS services or the other instrumented applications. So far, only the root activity's trace id needs to be updated as the new ids will be inherited by its child activity.
So far for X-Ray the only mechanism for manipulating trace id lies in the SDK, and there is no external mechanism for our backend system to convert them except logging. |
Thanks @lupengamzn, that clears up/confirms everything I was curious about. Also wanted to add @macrogreg and mention #42786 which is a similar (though slightly larger scoped) request from DataDog. I did push back when @macrogreg asked for similar functionality earlier but I suppose my resistance is dwindling as I think about it more and see the request replicated. Based on what I know now my suggestion is that we add some API similar to this: partial class Activity
{
// Whenever a root Activity is first sampled or started, this delegate will be invoked to determine the new ID. This property
// will be initialized to a default random number generation function provided by .NET Runtime which should be suitable
// for most use-cases.
public static Func<ActivityTraceId> TraceIdGenerationFunction { get; set; }
} I'll admit I grimace a little bit suggesting this because I worry that it creates opportunities for incompatibility. For example if someone tried to log to both DataDog and XRay your systems have mutually incompatible expectations for what proper IDs look like. To me this feels like it goes against the intent of the W3C TraceContext specification which intended to let vendors interoperate using a standardized definition of the trace id. However for practical purposes I'd guess that most developers are going to pick a single vendor and your efforts to increase compatibility can be decided based on requests from your customers rather abstract mandates of a specification. From the .NET side I do want to be up-front that we have no ability to control what code would set that property. Hopefully a cottage industry of NuGet library authors won't spring up with new uses for this, but if they do and their property setting overrides your setting then it would be up to you to figure out how to handle the incompatibilities. @tarekgh @cijothomas - what do you think of this? |
From Otel spec, it allows customization of both TraceId and SpanId, although it looks like AWS X-Ray requirement is only around TraceId. We may allow similar for SpanId generation as well. |
I am not comfortable with this proposal as this delegate would be global and can cause confusion if more than one party opt-in doing that. Also, there will not be easy way for anyone else to create the Ids differently when this delegate is set. Would it be better to have this delegate passed to the Activity creation instead and not be global? or at least we should have option to the Activity creators to opt-out of using the global delegate if needed. Or we have the mix of these 2 options, have the global static delegate as the default and then Activity creation can have option to choose something else if needed. Last, we need to get some feedback from different parties about this behavior and if they think if they can break because of that. Also, we need to know the perf implication if someone provided a delegate which consume a lot of time. |
we have the same issue with Activity.IdFormat as well, right? Its global and anyone could set/change it affecting everyone else. |
|
I agree this is possible, but I am relying on it being both unlikely and resolvable when it does occur. Do you think such a scenario would be likely?
This is conceptually easier but it doesn't solve the issue XRay and DataDog are trying to solve. Their code isn't the one that calls ActivitySource.StartActivity(), they are only a listener.
My guess is that that this requirement wouldn't occur often enough to be worth adding that extra complexity, but I'm not opposed to it if we needed it. A rough comparision might be looking at how often existing code invokes Activity.SetIdFormat() to override the global setting.
I'm glad to loop anyone you think we should be talking to but not sure who you had in mind?
I think the perf implications are the same as writing slow delegates for many of the other ActivityListener callbacks. If customers value high performance but a vendor implements something slow then customers will complain and/or switch to a new vendor. |
We don't know yet if this scenario is likely or not. As soon as we enable this feature, we'll start discover that I guess. That is why I am trying to be conservative to avoid any surprises. but if you think this is really unlikely, of course I trust your judgment.
Yes I know that and that is why I suggested to add the ability for overriding the TraceId during the creation too.
aspnet, Azure SDK or any other Azure teams. I think you know better than me here :-)
That make sense. do we have to care about generating colliding Ids especially Cijo asking to allow generating the Span Ids too? or would be same to complain about the owner of the generator? |
I have discussed this issue offline with @noahfalk and we agree with the Noah's proposal #46704 (comment). One change think is we'll make this delegate nullable public static Func<ActivityTraceId>? TraceIdGenerationFunction { get; set; } // alternative name would be GenerateTraceId per https://docs.microsoft.com/en-us/dynamicsax-2012/developer/naming-conventions-delegates-and-event-handlers? If you agree, I'll modify the proposal in the top with that and we can proceed. For the generating span id, we would wait to find more demand on that and we can discuss the details as needed as there is some concerns with it if we take the same approach as we did with the trace Id. |
This sounds great for my side, thanks for the workaround! |
@macrogreg please let's know if the proposal will work for DataDog before we proceed. |
@tarekgh sorry, I was out for a few days, just seeing this. I'll take a close look in the next 2 days. |
Thanks @macrogreg. Sorry interrupting you during your time off. I hope you can get back to us before Thursday as I expect the review will be scheduled that day. If you cannot do it, I can try to remove the ready for review mark till you respond. Please let me know what work best for you. |
Ok, @tarekgh , I'll review tomorrow morning and if it takes longer I'll add a comment here. |
Hi, again, folks, :) Sorry for having missed this discussion previously. It is, indeed, very similar to what I requested previously in #42786, slightly scoped down. This particular discussion is about the Trace Id only. Customising Span Id may (or may not) be necessary, but it is out of score here. :) The API looks good, I like that the change is small. However, I would like to make sure that we have a clear answer on how it would address the relevant scenarios. If the answer is already known and I overlooked it while catching up on the discussion, then sorry, please just point it out. If not, perhaps we can discuss a little more before locking it in. :) I. Considering parent activity when generating the Trace Id.So the scenario is that a service received a request. The request may or may not already have a trace context in the headers. Let's think through the possibilities.
The application server (or vendor code) handling the service call entry point will start the local root activity, which is in this case also the global trace root activity. That code will call one of the existing
The application server (or vendor code) handling the service call entry point will extract the trace context info from the request. It will create an instance of
The vendor will make a call of the form Activity.TraceIdGenerationFunction = ( () => { ActivityTraceId id = DoStuff(); return id; } ); They will want to do this once, during initialization.
This is the bit I an concerned about in the context of the current proposal. So it seems that a) This my preferred suggestion because it the the simplest. We should be aware, however, that it does not allow a vendor to consider information for Trace Id creation, that does not fit into b) c) Personally, I tend to prefer (a) for its simplicity or (c) to avoid the allocation, despite the complexity. based on the above, I would suggest the API shape to be public static Func<ActivityContext, ActivityTraceId> TraceIdGenerationFunction { get; set; } or public static Func<T, ActivityTraceId> TraceIdGenerationFunction { get; set; } What do you guys think? P.S. I have two more considerations to raise. I will post them in separate comments later today. :) |
Minor point: TraceIdCreationFunction, instead of TraceIdGenerationFunction? |
I removed ready for review tag for now till we resolve the feedback comments.
That is a good question. I can see the both ways. if For point #4, I believe if anyone specified a context when calling By the way, the public static Func<ActivityContex, ActivityTraceId> TraceIdGenerationContextFunction { get; set; }
public static Func<string, ActivityTraceId> TraceIdGenerationstringFunction { get; set; }
I'll add this as alternative proposed name in the proposal above. |
Agreed.
Yes. For the activities that have a remote parent (and thus a
public static Func<ActivityContex, ActivityTraceId> TraceIdGenerationContextFunction { get; set; }
public static Func<string, ActivityTraceId> TraceIdGenerationstringFunction { get; set; } Good point. We could consider some workaround similar to: class Activity
{
private static object s_traceIdCreationFunction = null;
public static Func<T, ActivityTraceId> GetTraceIdCreationFunction<T>()
{
if (_traceIdCreationFunction == null)
{
return null;
}
if (! typeof(Func<T, ActivityTraceId>).Equals(_traceIdCreationFunction.GetType()))
{
throw new InvalidOperationException($"You are requesting a TraceIdCreationFunction of type"
+ $" Func<T, ActivityTraceId> where T = {typeof(T).GetName()}."
+ $" However, for the the current TraceIdCreationFunction"
+ $" T is a different type. Make sure the respective types match.}");
}
return _traceIdCreationFunction;
}
public static void SetTraceIdCreationFunction<T>(Func<T, ActivityTraceId> func)
{
_traceIdCreationFunction = func;
}
} However, I am not saying that the complexity is warranted. I am only saying that the option exists. :) |
That is right.
This is the point that is unclear to me till now. Should the custom creator/generator be able to change the trace id of the current request? if so, then that is the concern I raised that we should honor the context passed to the StartActivity call without doing any change to it. Or you are saying the custom creator/generator will consider that in the future trace id generation when there is no trace id passed to StartActivity? Or I missing something else here? |
I am not sure I understand this bit:
Overall, that is exactly the topic of my second larger point which I am writing up right now. If we explicitly do not support (b) we can simplify, but then we should state that explicitly. However, the way I understand AWS's request (b) is a requirement. In a generic case for .NET, (b) can certainly occur, but whether or not .NET wants to support it right now is, of course to be decided. :) I'll finish my other point and post it :) |
I don't think your (b) was a requirement for AWS. From above in the conversation:
I would be hesitant to use this extensibility point for your (b) goal. I suspect we might want a separate feature that looks at an incoming ActivityContext and decides whether or not to create a new root activity. If it does create a new root Activity it probably also wants to generate various customizations of links, tags, baggage, etc which this callback would not allow you to do. This may also be related to extensible propagation that still needs to be discussed. I'd recommend that we keep this scoped to just the (a) goal for now. |
I am absolutely OK with scoping to only (a) as described in the earlier comment, however, are we sure it is enough then: Say, AWS X-Ray cannot process Trace IDs that are not "like they need them". If a call chain begins in an X-Ray monitored service, we cover everything by (a) and all is good. What if the call chain does not begin in a X-Ray monitored service? From what I gather from the above discussion, either X-Ray gets the Trace ID to be the way they need it to be, or they won't log the data. Am I missing something? (I am not sure that "log them out" above means. :) ) |
As mentioned earlier, here my second consideration. :) II. Making sure custom Trace Id plays nice with trace context propagation in a heterogeneous monitoring environment.Consider the scenario where vendor A uses default Trace Id creation, vendor B uses a custom Trace Id creation and vendor C uses a custom Trace Id creation that is different from B. Consider the following distributed call chains where Sn(X) means a service n monitored by vendor X. Request 1: user -> S1(A) -> S2(B) -> S3(C) -> S4(B) -> S5(A) Note:
Consider S2. I have previously discussed how that can be achieved here and there. That particular discussion was Datadog specific, but I think it can be generalized easily. .NET cannot enforce that vendors adhere to the process I describe in those links (or any other guidance that may or may not be based on that discussion), however:
When I generalize the conclusions made in the above links in my mind, I end up with the following.
There are different ways to achieve it. I will make a proposal, but there may be better ones:
delegate bool TryCreateCustomActivityTraceId(ActivityContext? parentContext, out ActivityTraceId customTraceId);
The public ActivityTraceId GetNonCustomTraceId(); If a custom generator was not set OR if This will allow context propagators to propagate the "non-custom id" using OTel propagation mechanisms and the custom generated id in vendor specific propagation headers. This should allow things to work in a heterogeneous environment in a manner discussed in the above links. |
My last point is shorter :) III. The full ID of a Span is really the
|
Thanks for feedback Greg!
My understanding is that the X-Ray developers acknowledge this as a limitation in the scenarios they can support. I don't think you are missing anything.
The current scope of this feature does not include any ID replacement capability. It is solely intended to handle creating the id for a new root activity. I would handle those other out-of-scope scenarios by either:
I suspect OTel or the W3C trace id spec discussion could be a better forum since I am guessing the interop questions you are raising aren't .NET specific? If there are areas that are .NET specific I'm happy to chat.
I agree its conceivable, but don't think that should be the bar for adding more features.
This part didn't feel persuasive for a few reasons:
|
@macrogreg per discussion we'll stick with the original proposal. and for the more broader scenarios you brought we can track addressing those in future. |
namespace System.Diagnostics
{
public partial class Activity
{
public static Func<ActivityTraceId>? TraceIdGenerator { get; set; }
}
} |
Thanks for the responses.
This is, actually, a very good point. |
Background and Motivation
The .NET runtime provides APIs to generate Activity, which can be adopted and processed as trace context and eventually emitted to multiple backend vendors by OpenTelemetry .NET SDK for the purpose of tracing. Currently .NET runtime only allows an activity's trace id to be randomly generated and there is no public API available to override it. While for some backend vendors, trace context with a non-random customized trace id is more preferred. For instance, AWS X-Ray backend will only process trace context with a trace id containing a timestamp. In this case, is it possible for the runtime to support customized activity trace id in Activity class, like adding API similar as Activity.SetTraceId()?
Previous discussion: open-telemetry/opentelemetry-specification#896
Proposal Edited by @tarekgh
Proposed API
Usage Examples
The text was updated successfully, but these errors were encountered: