-
-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic OpenTelemetry support #1876
Conversation
For users who already use OpenTelemetry instrumentation or would prefer to use it instead of Sentry's own tracing API, provide a means to disable Sentry's automatic trace instrumentation and a span processor that exports OpenTelemetry spans to Sentry. Usage: ```ruby Sentry.init do |config| # ... # Enable trace collection config.traces_sample_rate = 1.0 # or use traces_sampler # Disable automatic span instrumentation config.auto_trace_instrumentation = false end OpenTelemetry::SDK.configure do |c| # ... c.add_span_processor(Sentry::OpenTelemetry::SpanProcessor.new) end ``` Setting `auto_instrument_traces` to `false` in `Sentry.init` stops Sentry from automatically creating any transactions or spans. (Tracing must still be enabled with `traces_sample_rate` or `traces_sampler`.) `Sentry::OpenTelemetry::SpanProcessor` is notified whenever an OTel span is started or stopped. In reaction, it starts or stops a Sentry transaction (if the span has no parent) or span. When the transaction is finished it is picked up and processed by Sentry's worker thread like any other transaction would be. If the child service uses the `Sentry::Rack::CaptureExceptions` middleware (see order caveat below), then its transactions will be linked inside Sentry with the parent service's transaction. Caveats: sentry-rails is not yet supported (haven't looked into it yet). `Sentry.init` must be run before `OpenTelemetry::SDK.configure` (Sentry's `Net::HTTP` monkey patch to add the `sentry-trace` header must run after OTel's monkey-patch that adds an HTTP span) `Sentry::Rack::CaptureExceptions` must be before `OpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware` in the middleware stack (the hub and scope must be set up before OTel creates any spans) Since Sentry spans are started and stopped in reaction to OTel spans starting and stopping, the durations reported might be slightly different. (This should be fixable by overriding the timestamps on the Sentry spans we create to match OTel's). In the case where an OTel root span has a child span that starts after the root span has finished, this will be represented in Sentry as two transactions (one for the original root span, and one for the late child).
Are span processors common in otel sdks? aka could we do this everywhere? |
It's a part of the core spec - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#span-processor, and supported by every major SDK. |
This comment was marked as outdated.
This comment was marked as outdated.
ok I'm taking this up now so let's talk about this
Currently only net/http and redis span creation is disabled by this flag. What do we want from this flag? Do we want to turn off ALL instrumentation by sentry if otel is present? To prevent otel/sentry from stepping on each others' toes, we should just no-op most of the performance part of the SDK (except sampling) and that's a bit more work. If that's what we want, I can think of what the best way to do that is. |
Ideally we should turn off all automatic instrumentation that Sentry does with this flag - but still keep user's manual instrumentation. The reason I say this is because if we turn off all instrumentation, then we'll have API issues because we need to access Turning off all instrumentation is maybe cleaner though, because we can just instruct folks to switch to OpenTelemetry usage completely. Perhaps it's easier to start with turning off all automatic instrumentation? And then we can see how that looks/feels? |
Codecov ReportBase: 98.33% // Head: 98.39% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1876 +/- ##
==========================================
+ Coverage 98.33% 98.39% +0.06%
==========================================
Files 151 156 +5
Lines 9306 9669 +363
==========================================
+ Hits 9151 9514 +363
Misses 155 155
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
ea75bc7
to
781793b
Compare
ok this is backwards compat now with a new |
a2f8f8f
to
0ba42c2
Compare
83f8efb
to
7ec4748
Compare
@st0012, @AbhiPrasad this is very big now but it's ready to ship I think. We can fix small things and iterate on them later after shipping. Please let me know if you have questions! I will comment on the files themselves to explain some decisions for @st0012. I still wanna write some integration tests but you guys can start reviewing whenever you have time. Summary
Spec |
TraceData = Struct.new(:trace_id, :span_id, :parent_span_id, :parent_sampled, :baggage) | ||
|
||
class SpanProcessor < ::OpenTelemetry::SDK::Trace::SpanProcessor | ||
include Singleton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it a singleton because span_map
needs to be accessed in the propagator and I didn't want to add a global variable. This also makes it explicit that there is only one SpanProcessor
globally in the whole system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By globally I assume you mean within the threads of the same process?
When reading the doc, I see
This means that the hub used in onStart is the same hub used in onEnd and it means that hubs fork properly in async contexts.
But in Ruby SDK, every thread would have their own Hub
. So I'm not sure if a global SpanProcessor
can achieve that when it'll probably always use the main hub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of the spec is outdated, since we no longer use scope to do anything, it shouldn't matter.
@AbhiPrasad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup the spec is outdated, will fix in develop!
|
||
sentry_parent_span = @span_map[trace_data.parent_span_id] if trace_data.parent_span_id | ||
|
||
sentry_span = if sentry_parent_span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the root span from otel becomes a transaction and all other child spans become child spans of the transaction in sentry
span_context = ::OpenTelemetry::Trace::SpanContext.new( | ||
trace_id: [trace_id].pack('H*'), | ||
span_id: [span_id].pack('H*'), | ||
# we simulate a sampled trace on the otel side and leave the sampling to sentry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampling the transaction is left to sentry, so we enforce sampling on the otel side, we might iterate on this later for better sampling compatibility with otel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with OpenTelemetry
so I'm learning as I review this and have some questions:
- Is the goal to let users use opentelemetry-ruby but send the events to Sentry? If that's true, I assume we want them to opt-in to one side completely? e.g. Not tracing some parts with open-telemetry and the rest with Sentry's current tracing APIs/integrations.
- What concurrency model does
opentelemetry-ruby
support and does it fit with our a Hub/thread design? - The PR description says
... disable Sentry's automatic trace instrumentation and a span processor that exports OpenTelemetry spans to Sentry.
- What'd happen if some Sentry's automatic traces slip in?
Suggestions:
-
We probably want to add something in
sentry-opentelemetry
to enforce/warn users aboutThe SDK is initialized before the OpenTelemetry SDK is initialized.
-
I feel a helper like
Sentry.with_native_tracing?
could be helpful. Like in sentry-rails's railties, we can just NOT register tracing subscribers if it'sfalse
. Then we won't need to touch the subscribers' logic. -
Most of the changes in
sentry-ruby
could be extracted to a separate PR.
@@ -92,7 +93,9 @@ def start_transaction(transaction: nil, custom_sampling_context: {}, **options) | |||
transaction | |||
end | |||
|
|||
def with_child_span(**attributes, &block) | |||
def with_child_span(instrumenter: :sentry, **attributes, &block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could users use both instrumenters (sentry
and otel
) at the same time? Or it's simpler like: if users chose otel
, they'll give up sentry
tracing completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, it's one or the other. I don't know what direction we will take on this on the future, depends on a lot of parameters like adoption/future of SDKs etc
TraceData = Struct.new(:trace_id, :span_id, :parent_span_id, :parent_sampled, :baggage) | ||
|
||
class SpanProcessor < ::OpenTelemetry::SDK::Trace::SpanProcessor | ||
include Singleton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By globally I assume you mean within the threads of the same process?
When reading the doc, I see
This means that the hub used in onStart is the same hub used in onEnd and it means that hubs fork properly in async contexts.
But in Ruby SDK, every thread would have their own Hub
. So I'm not sure if a global SpanProcessor
can achieve that when it'll probably always use the main hub.
Yes, exactly, we want to make it easier for users using otel to switch over to sentry for performance while continuing to use sentry for errors.
Yes, this is what the
The initial design was using scope but we removed that so our hub/scope management should not matter here anymore.
It's fine, not the worst. This is very much an MVP/experiment so we'll just fix it if someone reports a bug.
Apologies for the size, we were iterating on this a fair bit so I wasn't sure what shape it'd finally take. |
closing in favour of split PRs |
Closes #1907
For users who already use OpenTelemetry instrumentation or would prefer to use it instead of Sentry's own tracing API, provide a means to disable Sentry's automatic trace instrumentation and a span processor that exports OpenTelemetry spans to Sentry.
Usage
Setting
auto_instrument_traces
tofalse
inSentry.init
stops Sentry from automatically creating any transactions or spans. (Tracing must still be enabled withtraces_sample_rate
ortraces_sampler
.)Sentry::OpenTelemetry::SpanProcessor
is notified whenever an OTel span is started or stopped. In reaction, it starts or stops a Sentry transaction (if the span has no parent) or span. When the transaction is finished it is picked up and processed by Sentry's worker thread like any other transaction would be.If the child service uses the
Sentry::Rack::CaptureExceptions
middleware (see order caveat below), then its transactions will be linked inside Sentry with the parent service's transaction.Caveats
Sentry.init
must be run beforeOpenTelemetry::SDK.configure
(Sentry'sNet::HTTP
monkey patch to add thesentry-trace
header must run after OTel's monkey-patch that adds an HTTP span)Sentry::Rack::CaptureExceptions
must be beforeOpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware
in the middleware stack (the hub and scope must be set up before OTel creates any spans)