Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite span status section and standardize preamble in manual instrumentation #3694

Merged
merged 9 commits into from
Jan 9, 2024

Conversation

cartermp
Copy link
Contributor

fixes #3685

@cartermp cartermp changed the title Rewrite span status page and standardize preamble in manual instrumentation Rewrite span status section and standardize preamble in manual instrumentation Dec 16, 2023
content/en/docs/concepts/signals/traces.md Outdated Show resolved Hide resolved
instrumentation library may decide to set a span status to `Error`, but that may
actually be an incorrect status to assign in your application's specific
context. In such a case, you can change the status to `Ok`, causing an
Observability backend to not count this span as an error.
Copy link
Member

@yurishkuro yurishkuro Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if an error may have actually occurred

I don't like this emphasis on overriding error, this is not the primary use case for OK status. The primary use case is the application owner with domain knowledge explicitly sets this status to OK. It does not mean that they are changing it from Error.

For example, an instrumentation library may decide to set a span status to Error

Personally, I don't feel like this answers the original question. To me the question is: is instrumentation ever allowed to set anything but Unset? Can they only set Error but not OK? The OTEP says they should not set OK. The reasoning doesn't make sense to me. I would argue that when a span is Ended, if its stuatus is still Unset it should be change to OK, otherwise we get the justifiable confusion from end users who cannot find any OK spans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in general I agree that the spec made the wrong design here. And moreover, I think there might need to be better guidance for users of Observability backends around how you query this kind of data. In generally, "show me what wasn't an error" is what you want, but some UIs (like Tempo) let you filter specifically by a span status and that's where some confusion arises. If the spec detailed things in a more user-friendly way then this wouldn't be an issue, but alas I don't think it can be changed now.

The primary use case is the application owner with domain knowledge explicitly sets this status to OK

I'd agree, but I'd love to have a more specific example though. When there's no real difference between an actually-complete-and-no-error span set to OK and left Unset, then I'm not sure what the benefit to explicitly setting it to OK is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an odd critique in the OTEP about the approach that OpenTracing had, where we had error=true attribute. I actually think it was pretty good, much better than this confusion between Unset and OK. Since that status was elevated to the data model (instead of semconv in OT), I would prefer the status to be OK or Error and not have Unset as an option. Unset is really an intermediate state that could be managed in the SDK until the span is finished. Once it's finished, if the error status was not set then the span is automatically OK. Since OTEL already has Unset in a stable spec, I think the path forward is to tweak the spec to say that Tracer should change Unset->OK when the span is finished. The other train of thought in the OTEP that some backend components might change Unset to OK is completely alien to me - why would a backend have any better knowledge about what happened in the application than the application & instrumentation that produced the data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a valuable discussion, can we turn this into a spec issue? For the docs I think we need to document the as is state and then later update it accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the interim we should tweak the example(s) in prose a bit, more to this point:

I don't like this emphasis or overriding error, this is not the primary use case for OK status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro I adjusted the wording here and in the preamble. PTAL?

@svrnm
Copy link
Member

svrnm commented Dec 18, 2023

{{% docs/instrumentation/span-status-preamble %}}

I love that 😍

content/en/docs/instrumentation/net/manual.md Outdated Show resolved Hide resolved
content/en/docs/instrumentation/net/manual.md Outdated Show resolved Hide resolved
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@cartermp
Copy link
Contributor Author

@kaylareopelle I applied suggestions -- any further feedback?

@cartermp cartermp dismissed kaylareopelle’s stale review January 9, 2024 16:56

Feedback addressed

@cartermp
Copy link
Contributor Author

cartermp commented Jan 9, 2024

Feedback all addressed and multiple approvals, so I'll merge

@cartermp cartermp merged commit b906f22 into main Jan 9, 2024
17 checks passed
@cartermp cartermp deleted the cartermp/span-status-rework branch January 9, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explain rationale for Status attribute Unset vs OK
5 participants