-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support OpenTelemetry concepts on Activity #31373
Comments
Looks great! Thanks @tarekgh and @noahfalk! One question related to using (Activity foo = new Activity(...))
{
throw new FoobarException(...);
} If it is possible to capture the exception and associate it with the Activity? In OpenTelemetry Python there is such feature (although I think it is more like a question to the language/runtime rather than API) where we know if there is exception and derive status code. |
Thanks @reyang for your review. I don't think we can automatically capture the exception inside using block without try-catch. The user can write the code like: using (Activity foo = Activity.StartNew(...))
{
try
{
throw new FoobarException(...);
}
catch (Exception e)
{
foo?.SetCustomProperty("Exception", e);
}
} and if there is any listener, will get the notification in stopping the activity and then can check if the exception is set there using |
If developers perform exception handling in a central place (e.g. the evil global exception handler, or catching exception from top level code), is there a way for them to know what's the enclosing activity? (I guess no?) try {
using (var foo = new Activity("foo"))
{
using (var baz = new Activity("baz")
{
DoSomething();
}
DoSomething();
}
using (var bar = new Activity("bar"))
{
DoSomething();
}
} catch (FoobarException ex) {
log(
ex,
MagicalFunctionWhichReturnsTheEnclosingActivity(ex).Name, // this is hard since activity is disposed already
MagicalFunctionWhichReturnsTheVeryContext(ex).SpanId,
);
} |
Can you also share the examples for creating an Activity from a parent activity ID only? Specifically around async workflows. There are lots of use cases for us where we might spawn off an async task while the sync portion of the API needs to resume back in existing Activity context. In that case, we would still want to link the async task's activity to the API activity. There are couple of options I can think of Option 2 |
Thanks Tarek! Its a little tricky to comment on in GitHub because all the text is in the body of an issue. I'll do quotes but if the comments grow it might be helpful to put the text as a markdown document in a PR to allow inline commenting.
I think there is another major value prop worth calling out explicitly: it is now reasonable for a listener such as OpenTelemetry to observe Activity instrumentation from any code component in the app. Previously OpenTelemetry would have needed a-priori knowledge of the strings that would be used for DiagnosticSource name and IsEnabled() callbacks for each Activity it wanted to receive callbacks for.
I don't think the current proposal is making any performance improvement? Previously when we had ActivitySource that offered a faster path, but we've removed it.
Similar to above, I don't think the current proposal is faster?
Naming suggestions: For ActivityListener I assume the callback behavior is that OnActivityStarted/OnActivityStopped get invoked for every Activity, but ShouldCreateActivity only gets invoked for Activities created via Activity.StartNew? We should clarify that. I've been thinking a bit about how we handle cases where someone is using Hierarchial ids rather than W3C ids. I think there may be an easy path where we add an overload to Activity.StartNew and ActivityListener.ShouldCreateActivity that uses a string for the id instead of the ActivityContext. |
If we had a defined place on the Activity where Exception should be stored then the BCL (or any of our users) could make a helper method that lets you write code like this:
That helper could be implemented something like:
In a catch handler no, but in an exception filter potentially yes. However if the exception is caught and then rethrown at some intermediate frame (as often happens with async) then the filter won't help either. |
@noahfalk thanks for the feedback. I have updated the description according to your comments. please let me know if I missed anything there. @chandramouleswaran thanks for the feedback too, I'll try to get back to you soon. |
Is there a fast way to check if activity should be created? Sometimes collecting context and links might be expensive, it would be nice to have a performant way to know beforehand if someone is listening on the activity. |
@pakrym we if you are talking about the Listener side and want to avoid dealing with context/links there, we may think to add a callback to the listener which doesn't take a context/links. but I am not sure if this case concerns you. namespace System.Diagnostics
{
public class ActivityListener : IDisposable
{
...
public virtual bool ShouldCreateActivity(string activityName, ActivityContext parent, IEnumerable<ActivityLink>? links)
public virtual bool ShouldCreateActivity(string activityName)
...
}
} |
As a producer of activity I want to know when any particular activity is enabled so I don't have to extract links/ActivityContext from, for example an incoming request, when no one is listening. |
got it. would adding something like: public static bool ActivityListener.IsEnabled(string activityName) will it address your scenario? |
Would it be as fast as |
If there are no registered listeners, then yes, we'll just check a static field value and if it is null we'll just return false. if there is any registered listener, we'll need to call them to know if any of the listeners interested in creating such an activity with the input name. |
Then I don't think it addresses my scenario, N virtual calls just to know if an activity should be created seems way to high for something like Kestrel. |
Agreed it certainly scales differently. So far I can't come up with a scenario where that scaling appears to matter? If there are scenarios that are creating 100s or 1000s of Activities per request but somehow the requests are also really fast it would be an issue - but I don't forsee that happening in practice. If we had a plausible hypothetical example I might see it differently. |
Summary to the status of the discussions
|
Thanks @tarekgh!
I am concerned that if we add that setter people will use it in ways that break pre-existing code. All .NET code for the past decade has been able to rely on OperationName being an immutable value. I recommend we do not add the setter and get feedback from @SergeyKanzhelev or @reyang to understand what the use case is to search for an alternate solution.
I had an open question (to @SergeyKanzhelev and @reyang) in that portion that needs to be answered before we can resolve IsRecording: "The spec isn't clear about what the value prop is to create Spans where IsRecording is false given that they won't be reported to SpanProcessors?"
It is hard to provide feedback on this before the code changes are visible. Ideally new work would should up as changes to the APIs in the design doc. Changes that people are proposing but we ultimately decide not to do can show up in the Q&A section to record the request and the rationale for why it didn't happen or what the alternative is. Earlier I commented about needing to handle the hierarchial ID case. I think we need an overload ShouldCreateActivity which uses a string instead of ActivityContext. |
FYI - this PR has some context about the span name semantic. I don't think users would be blocked if there is no
I think the spec does not ask SDK to create a new span. Most implementations would return a dummy span singleton when we know it is not being reported to |
Isn't it that entire design though? This is tracing through a latency chain of many components not a single request; its N requests occurring sequentially so is directly N x additive to the latency. |
@noahfalk I have changed the design doc to reflect the latest changes. we may need to add more bullets in the |
Thanks! Reading thought the PR I still felt like I was grasping at the scenario where a user would need to call UpdateName(). The closest explanation I could find was this sentence at the end "Instrumentation MUST NOT default to using URI path as span name, but MAY provide hooks to allow custom logic to override the default span name." If I follow correctly the worry was that the code creating the Span either can't access the URI or doesn't know what portions of it are dynamic. Some code later on does have that information and wants to use it to improve descriptive text shown in the UI. If that is accurate it suggests there are two different name concepts being squeezed into one property, the initial one is for classification and sampling within the app, and the later improved one is only intended to aid end-user understanding. If we wanted to model that in .Net my first thought would be to have two properties, Activity.OperationName and Activity.DisplayName. The default value of DisplayName would be identical to OperationName, but it can be changed. However this feels like a place we'd want to engage on the broader spec rather than coming up with our scheme in isolation. I agree that the best path in the immediate future would be to leave it out.
Cool, then it sounds like there is no issue if .Net returns a null Activity to represent this case rather than a dummy.
By single request I am referring to a single trace-id from the point it enters the .Net Core app until they point we send a response message that completes the request. Certainly it is possible the code that implements this request could have many sequential steps within it, including the creation of many Activities. However I don't think there is a scenario where the implementation both creates a large number of Activities and it is already extremely fast? For example do we expect a scenario where a request completes in 100us needs 100 Activities to instrument all of its dependencies? Being fast is usually mutually exclusive with doing a large amount of work.
Thanks! |
I don't know how to comment on this thread. I'll try. Pretty-please, can we switch to google doc?
|
I agree a single inline thread is hard to follow - I am already trying to move discussion to the PR which does allow for inline annotations and replies to different threads of conversation. I think this is similar to properties we'd get from a google doc. I don't really want to move yet again to google doc because (a) it is different than the convention every other .NET design issue uses, (b) every time we move the discussion forum it breaks the flow and makes it that much harder to understand the history or accumulate issues in one place. It is very tempting (I am guilty too) to keep replying back to this issue. To try and break that pattern I copied each of the issues you raised to comments in relevant portions of the PR and responded there instead. |
@noahfalk yes, PR in designs repo is OK. |
Just FYI, I have updated the design proposal dotnet/designs#98 to reflect the latest discussions and decisions. |
Any updates here? |
@chandramouleswaran, sorry for the late reply I have missed this one unintentionally. thanks for the reminder. ActivitySource.StartActivity overload APIs provide multiple options to specify the parent context. /// Creates a new Activity object if there is any listener to the Activity, returns null otherwise.
public Activity? StartActivity(string name, ActivityKind kind = ActivityKind.Internal);
/// Creates a new <see cref="Activity"/> object if there is any listener to the Activity events, returns null otherwise.
public Activity? StartActivity(
string name,
ActivityKind kind,
ActivityContext parentContext, IEnumerable<KeyValuePair<string, string?>>? tags = null,
IEnumerable<ActivityLink>? links = null,
DateTimeOffset startTime = default);
/// Creates a new <see cref="Activity"/> object if there is any listener to the Activity events, returns null otherwise.
public Activity? StartActivity(
string name,
ActivityKind kind,
string parentId,
IEnumerable<KeyValuePair<string, string?>>? tags = null,
IEnumerable<ActivityLink>? links = null,
DateTimeOffset startTime = default); Here are the options:
Please let me know if you have more questions or anything unclear here. |
(marked as approved because that's what we decided in the notes but I didn't update the item during the review) |
Use new Activity to Replace OT Span: open-telemetry/opentelemetry-dotnet#660 |
This issue tracking the work as the part of the issue https://github.com/dotnet/corefx/issues/42305
Please look at dotnet/designs#98 for the detailed design.
The text was updated successfully, but these errors were encountered: