-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite span status section and standardize preamble in manual instrumentation #3694
Conversation
instrumentation library may decide to set a span status to `Error`, but that may | ||
actually be an incorrect status to assign in your application's specific | ||
context. In such a case, you can change the status to `Ok`, causing an | ||
Observability backend to not count this span as an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if an error may have actually occurred
I don't like this emphasis on overriding error, this is not the primary use case for OK status. The primary use case is the application owner with domain knowledge explicitly sets this status to OK. It does not mean that they are changing it from Error.
For example, an instrumentation library may decide to set a span status to
Error
Personally, I don't feel like this answers the original question. To me the question is: is instrumentation ever allowed to set anything but Unset? Can they only set Error but not OK? The OTEP says they should not set OK. The reasoning doesn't make sense to me. I would argue that when a span is Ended, if its stuatus is still Unset it should be change to OK, otherwise we get the justifiable confusion from end users who cannot find any OK spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in general I agree that the spec made the wrong design here. And moreover, I think there might need to be better guidance for users of Observability backends around how you query this kind of data. In generally, "show me what wasn't an error" is what you want, but some UIs (like Tempo) let you filter specifically by a span status and that's where some confusion arises. If the spec detailed things in a more user-friendly way then this wouldn't be an issue, but alas I don't think it can be changed now.
The primary use case is the application owner with domain knowledge explicitly sets this status to OK
I'd agree, but I'd love to have a more specific example though. When there's no real difference between an actually-complete-and-no-error span set to OK and left Unset, then I'm not sure what the benefit to explicitly setting it to OK is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an odd critique in the OTEP about the approach that OpenTracing had, where we had error=true
attribute. I actually think it was pretty good, much better than this confusion between Unset and OK. Since that status was elevated to the data model (instead of semconv in OT), I would prefer the status to be OK or Error and not have Unset as an option. Unset is really an intermediate state that could be managed in the SDK until the span is finished. Once it's finished, if the error status was not set then the span is automatically OK. Since OTEL already has Unset in a stable spec, I think the path forward is to tweak the spec to say that Tracer should change Unset->OK when the span is finished. The other train of thought in the OTEP that some backend components might change Unset to OK is completely alien to me - why would a backend have any better knowledge about what happened in the application than the application & instrumentation that produced the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a valuable discussion, can we turn this into a spec issue? For the docs I think we need to document the as is state and then later update it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the interim we should tweak the example(s) in prose a bit, more to this point:
I don't like this emphasis or overriding error, this is not the primary use case for OK status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I adjusted the wording here and in the preamble. PTAL?
Co-authored-by: Yuri Shkuro <[email protected]>
layouts/shortcodes/docs/instrumentation/span-status-preamble.md
Outdated
Show resolved
Hide resolved
I love that 😍 |
Co-authored-by: Kayla Reopelle (she/her) <[email protected]>
Co-authored-by: Kayla Reopelle (she/her) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@kaylareopelle I applied suggestions -- any further feedback? |
Feedback all addressed and multiple approvals, so I'll merge |
fixes #3685