Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification required for Span.setStatus #1584

Closed
iNikem opened this issue Mar 29, 2021 · 25 comments · Fixed by #1685
Closed

Clarification required for Span.setStatus #1584

iNikem opened this issue Mar 29, 2021 · 25 comments · Fixed by #1685
Assignees
Labels
area:api Cross language API specification issue area:error-reporting Related to error reporting spec:trace Related to the specification/trace directory

Comments

@iNikem
Copy link
Contributor

iNikem commented Mar 29, 2021

One specific situation is currently absent from https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-status. Namely, imagine we have a web application running with Otel auto-instrumentation. Instrumentation creates a SERVER span for an incoming request. During this request, some exception is thrown, which sets response code to 400. But framework provides an exception callback, during which application developer retrieves the SERVER span from the currently active context and sets its status to OK, based on some application logic.

Now, when auto-instrumentation is going to end this SERVER span, then based on http semantic convention it MUST set spans status to ERROR. And spec referenced above allows it to do so. But I think the intention was to allow application developer or operator to override any automatic decision by manually setting span status to OK. Should this mean that auto-instrumentation MUST NOT override span status OK? Please note also, that currently there is no way to read span status from Otel API.

@iNikem iNikem added the spec:trace Related to the specification/trace directory label Mar 29, 2021
@carlosalberto
Copy link
Contributor

But I think the intention was to allow application developer or operator to override any automatic decision by manually setting span status to OK.

That's correct, based on my understanding. @tedsuo ?

@lkrzyzanek
Copy link

It's quite common practice to use 400 (bad request) on REST API to indicate that client provided wrong input parameters.

We use it for any kind of validation of input parameters including also some business logic like to find appropriate records in DB. That's why it's quite important to have possibility to indicate that 400 is not an error.

@pyohannes
Copy link
Contributor

I am confused by this too. I have an use case where HTTP auto-instrumentation returns a 404, but the application can handle that gracefully (by hitting a fallback endpoint), or in some cases even expects a 404.

I was wondering about this sentence in the spec:

Analysis tools SHOULD respond to an Ok status by suppressing any errors they would otherwise generate. For example, to suppress noisy errors such as 404s.

To me it's not clear whether that sentence refers to the HTTP span itself, to its parent span, or to any span in the trace.

So, given a trace with two spans:

  • Span 1: Unset, Span 2: Error. The backend/analysis tool should treat this as an error.
  • Span 1: OK, Span 2: Error. The backend/analysis tool should not treat this as an error, but suppress the error status on Span 2?

I also posted a related question on slack.

@reyang reyang assigned reyang and carlosalberto and unassigned reyang Apr 2, 2021
@reyang
Copy link
Member

reyang commented Apr 2, 2021

Discussed during the Friday issue triage meeting, assigned this to @carlosalberto, it might take some time since this seems to be a challenging topic.

@Oberon00
Copy link
Member

Oberon00 commented Apr 6, 2021

Please also consider the original OTEP https://github.com/open-telemetry/oteps/blob/main/text/trace/0136-error_flagging.md which has suggestions on how to handle this with the Status Source attribute of the status.

@iNikem
Copy link
Contributor Author

iNikem commented Apr 6, 2021

If Span's status were readable attribute then we could have a very simple solution: "if span has OK status set, then auto-instrumentations MUST NOT override it". That's it, isn't it?

@Oberon00
Copy link
Member

Oberon00 commented Apr 7, 2021

How would one put the OK status in the same span that the auto-instrumentation would set the status of?

@iNikem
Copy link
Contributor Author

iNikem commented Apr 7, 2021

How would one put the OK status in the same span that the auto-instrumentation would set the status of?

My original example describes exactly this situation. Auto-instrumentation sets the status of the span, which corresponds to the incoming http request, after/during the response is being sent out. Application developer can grab the current span in their code while processing that request and set span's status. Way before auto-instrumentation attempts to do so.

@Oberon00
Copy link
Member

Oberon00 commented Apr 7, 2021

Sorry for not reading properly 🙈

I think this example is perfectly solved in the original, accepted but only partially spec'd out OTEP: The user would call SetStatus(Status.OK, StatusSource.USER) and the instrumenation would then later call SetStatus(Status.ERROR, StatusSource.INSTRUMENTATION). Then the backend should normally prefer the USER status "as it is a communication from the application developer or operator and contains valuable information".

CC @tedsuo

@Oberon00 Oberon00 added area:api Cross language API specification issue area:error-reporting Related to error reporting labels Apr 7, 2021
@iNikem
Copy link
Contributor Author

iNikem commented Apr 7, 2021

Status source will work as well, certainly. I still think that my proposal is easier :)

@Oberon00
Copy link
Member

Oberon00 commented Apr 8, 2021

Reading the status would violate the design principle that spans are write-only (except for SpanContext which is immutable, fixed from creation, and required for propagation). Also, every instrumentation would need to check the status and if it forgets, the user-status is overwritten.

A solution somewhere in the middle would be to provide a SetStatusIfUnset API.

@iNikem
Copy link
Contributor Author

iNikem commented Apr 8, 2021

All three solutions work for me :)

@carlosalberto
Copy link
Contributor

Status source will work as well, certainly. I still think that my proposal is easier :)

I also like your solution more, but I'm afraid that providing a getter for Status may be hard (impossible), e.g. a streaming implementation (besides what @Oberon00 already mentioned regarding the write-only design).

I'd be up for adding the Instrumentation part to if languages can offer this without breaking their APIs, which I'm not totally sure it's feasible (might need to ask maintainers).

SetStatusIfUnset looks not as clean but it might be a good trade off definitely. I will bring this up in our SIG call today ;)

@tedsuo
Copy link
Contributor

tedsuo commented Apr 13, 2021

From the spec meeting today:

We discussed the options. There is general agreement that adding state to the API (via a status getter) is not a feasible solution.

We discussed adding a StatusSource field with two states, USER and INSTRUMENTATION. The USER status always wins. In practice, we add this feature by changing the current SetStatus function to set the source as USER, to avoid breaking user code. An additional SetStatus method or option would be added to set INSTRUMENTATION as the source.

Adding StatusSource has been approved as a solution to this problem, but we are reluctant to add this complexity and extra state until it is proven that we need it.

As a simpler solution, we are proposing that we add a SetStatus option to End. This status option would only be applied when the status is currently UNSET. All instrumentation would then be required to use this option. This allows the user to override instrumentation when setting the status. This incremental approach would be compatible with adding StatusSource at a later date, if we determine that we need it.

@iNikem volunteered to investigate whether adding a SetStatus option on End is a feasible solution.

  • Investigate whether the current Java instrumentation could meet the requirement of only setting the status when the span ends.
  • Investigate whether exception handling works with this solution.

It would be helpful to check other languages as well, and confirm that this solution is feasible before we try it. Please chime in if you, dear reader, can spend some time this month investigating this issue.

@pyohannes
Copy link
Contributor

@tedsuo I have use cases where StatusSource would help, but the simpler solution (overriding status on End) wouldn't.

One use case is overriding error status of HTTP client instrumentations. With several HTTP client instrumentations I worked with there is no way to access the span that is created by the instrumentation. Furthermore there might be scenarios where application developers use libraries which in turn use instrumented HTTP clients - it would be very hard for them to directly access those spans across libraries.

StatusSource provides a way to override status without having direct access to spans created by instrumentations, and this is a need users have already expressed to me.

@iNikem
Copy link
Contributor Author

iNikem commented Apr 14, 2021

StatusSource provides a way to override status without having direct access to spans created by instrumentations

@pyohannes Can you elaborate on this, please? StatutSource still needs to be passed to a span, how will it help you if you don't access to spans?

@pyohannes
Copy link
Contributor

Can you elaborate on this, please? StatutSource still needs to be passed to a span, how will it help you if you don't access to spans?

@iNikem On backend or exporter side (maybe triggered by configuration) one could say: ignore error status values when StatusSource is INSTRUMENTATION, but honor the error status when the StatusSource is USER.

That is how I interpret this sentence in the OTEP:

Analysis tools MAY disregard status codes, in favor of their own approach to error analysis. However, it is strongly suggested that analysis tools SHOULD pay attention to the status codes when set by USER, as it is a communication from the application developer or operator and contains valuable information.

Then any instrumentation might set error status codes, but the backend configured in that way will only mark traces as failed when there's an error status code where StatusSource=USER.

Not the most elegant. But I think it can get very tedious for application developers to re-set status codes of instrumentations.

@iNikem
Copy link
Contributor Author

iNikem commented Apr 15, 2021

If backend "ignore error status values when StatusSource is INSTRUMENTATION", then it means all application developer has to set status code whenever they expect the span to be an error. This is even more work than re-setting status when there is a conflict between developer expectation and auto-instrumentation decision.

@pyohannes
Copy link
Contributor

If backend "ignore error status values when StatusSource is INSTRUMENTATION", then it means all application developer has to set status code whenever they expect the span to be an error.

The backend could "ignore error status values when StatusSource is INSTRUMENTATION only if there is a span in the trace with StatusSource=USER and Status=OK".

However, another possibility without the need for StatusSource would be to "ignore error status values if there is a span in the trace with Status=OK". I wonder if this section of the current specification could be interpreted in this way; I tried to get clarification on this, with no success yet:

Analysis tools SHOULD respond to an Ok status by suppressing any errors they would otherwise generate. For example, to suppress noisy errors such as 404s.

@iNikem Do you think a solution in those lines could solve your problem? Otherwise I'll open another issue to not mix up different problems and requirements.

@iNikem
Copy link
Contributor Author

iNikem commented Apr 15, 2021

I think that ignoring status of one span based on the status of another span is a VERY special case and certainly is not something that Otel specification can recommend.

@iNikem
Copy link
Contributor Author

iNikem commented May 4, 2021

As a simpler solution, we are proposing that we add a SetStatus option to End. This status option would only be applied when the status is currently UNSET. All instrumentation would then be required to use this option. This allows the user to override instrumentation when setting the status. This incremental approach would be compatible with adding StatusSource at a later date, if we determine that we need it.

@iNikem volunteered to investigate whether adding a SetStatus option on End is a feasible solution.

  • Investigate whether the current Java instrumentation could meet the requirement of only setting the status when the span ends.
  • Investigate whether exception handling works with this solution.

I have a POC changes as described above in Otel Java API/SDK and Java instrumentation repo. They demonstrate that the approach above can work. Only one or two instrumentations out of many in Java instrumentation repo may require somewhat elaborate changes, in the vast majority of cases changes are trivial.

During implementation and following discussion with @jkwatson we found that:

  • Having Span.end(Status) in API may confuse application developers into thinking that they should use this method to end a span, instead of Span.end() which should be used in the majority of cases.
  • A simpler approach may be to change Span.setStatus method behaviour to "freeze" status if it was called with OK and ignore any later calls to setStatus.

In both cases the behaviour of API methods may be surprising to the end-users, because in some cases provided Status will be ignored. In both cases we have to rely on documentation to clear confusion.

@carlosalberto
Copy link
Contributor

carlosalberto commented May 4, 2021

From the SIG meeting: a take on option B (freezing Span.setStatus) will be done in the DotNet SIG, and after that the change will be proposed against the Specification.

@iNikem
Copy link
Contributor Author

iNikem commented May 5, 2021

For the record, my POC of option A above:
API/SDK
Instrumentations

@reyang
Copy link
Member

reyang commented May 6, 2021

.NET PoC of Option B is here.
We've discussed during the OpenTelemetry .NET SIG meeting and folks are good with this approach.

@jkwatson
Copy link
Contributor

jkwatson commented May 7, 2021

I'm wondering, if we go with option B and the SDK is now going to start enforcing these ideas, should the SDK also ignore any explicit call to set the status to UNSET?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue area:error-reporting Related to error reporting spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants