-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTEP: Recording exceptions as log based events #4333
base: main
Are you sure you want to change the base?
Conversation
|
||
## Motivation | ||
|
||
OTel recommends recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTel recommends recording exceptions using span events
Do we actually make such a recommendation?
https://github.com/open-telemetry/semantic-conventions/tree/main/docs/exceptions Lists conventions on how to store Exceptions in Spans and Logs, but I don't see any recommendation being made there.... Is there another place I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, I'll change the wording.
what I mean is that the only documented (in the spec) way is https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/exceptions.md. If we combine it with no-user-facing-log-api we had until recently, it results in span events being the only guaranteed way to record exceptions for instrumentation library that doesn't want to depend on a 3rd party logging facade. Not a problem in some languages, but a problem in others.
|
||
OTel recommends recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries. | ||
|
||
Log-based exception events have the following advantages over span events: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with all of this. Minor point worth adding - one advantage of using SpanEvents for Exception is that they automatically get sampled along with the corresponding Spans - It is possible to achieve similar effects with Logs, but users have to do extra work to ensure Logs are sampled similar to Spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's a disadvantage since users don't have a choice. And a similar effect can be achieved with a configuration option and logger.IsEnabled(..., context)
. I.e. I feel this is not a fundament problem - we can make log sampling almost as easy as with span events.
- they can be recorded for operations that don't have any tracing instrumentation | ||
- they can be sampled along with or separately from spans | ||
- they can have different severity levels to reflect how critical the exception is | ||
- they are already reported natively by many frameworks and libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more advantage: SpanEvents has the potential of being affected by the Max_SpanEvents_Per_Span limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why is it an advantage? I think it's a safe belt preventing buffering unbound amount of events on spans, with log based events we have batching processor for it and also can do more interesting things like log throttling in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say that, when using Spans for Exceptions, there is a chance that my exception is the one that gets dropped due to span events already full by some other stuff. Yes limit is for own safety, but if the exception gets lost, then that is also bad...
I think this is a related issue: |
// we're rethrowing an exception here since the underlying | ||
// platform code may or may not record exception logs depending on JRE, | ||
// configuration, and other implementation details | ||
logger.eventBuilder("exception") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"exception" here would be the EventName, right? is that the intent? Shouldn't it be more fully qualifies than just "exception"? I was thinking something like Namespace.Networking.SocketChannel.Write.Exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that we want to model it as an OTel Event? How would we define the event name concrete exception types? Maybe it should be just a Log Record (see #4234)
"exception" is almost as abstract as an "event".
Different instrumentation libraries may want to add additional contextual attributes related to the exception. There may be also some use cases that one would like to set a complex body.
On the other hand, we can say that the OTel Events have "minimal" requirements regarding its structure and instrumentations may be able to add any additional data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points!
I agree that exception
event name is not particularly useful. We don't have to record exceptions as events at all. I changed this otep to default to logs. There is also recommendation to define custom error events in the text.
920f366
to
569313a
Compare
b06a09f
to
76c7d85
Compare
|
||
1. OpenTelemetry should provide configuration options and APIs allowing (but not limited) to: | ||
|
||
- Record unhandled exceptions only (the default documented in this guidance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing some learnings from .NET https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/reporting-exceptions#unhandled-exception
Related to open-telemetry/semantic-conventions#1536
Changes
Recording exceptions as span events is problematic since it
This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.
If accepted, the follow-up spec changes are expected to replace existing (stable) documents:
Related OTEP(s) #CHANGELOG.md
file updated for non-trivial changesspec-compliance-matrix.md
updated if necessary