Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic OpenTelemetry support #1876

Closed
wants to merge 54 commits into from
Closed

Add basic OpenTelemetry support #1876

wants to merge 54 commits into from

Conversation

mjq-sentry
Copy link

@mjq-sentry mjq-sentry commented Aug 26, 2022

Closes #1907


For users who already use OpenTelemetry instrumentation or would prefer to use it instead of Sentry's own tracing API, provide a means to disable Sentry's automatic trace instrumentation and a span processor that exports OpenTelemetry spans to Sentry.

Usage

Sentry.init do |config|
  # ...

  # Enable trace collection
  config.traces_sample_rate = 1.0  # or use traces_sampler

  # Disable automatic span instrumentation
  config.auto_instrument_traces = false
end

OpenTelemetry::SDK.configure do |c|
  # ...

  c.add_span_processor(Sentry::OpenTelemetry::SpanProcessor.new)
end

Setting auto_instrument_traces to false in Sentry.init stops Sentry from automatically creating any transactions or spans. (Tracing must still be enabled with traces_sample_rate or traces_sampler.)

Sentry::OpenTelemetry::SpanProcessor is notified whenever an OTel span is started or stopped. In reaction, it starts or stops a Sentry transaction (if the span has no parent) or span. When the transaction is finished it is picked up and processed by Sentry's worker thread like any other transaction would be.

If the child service uses the Sentry::Rack::CaptureExceptions middleware (see order caveat below), then its transactions will be linked inside Sentry with the parent service's transaction.

Caveats

  • sentry-rails is not yet supported (haven't looked into it yet).
  • Sentry.init must be run before OpenTelemetry::SDK.configure (Sentry's Net::HTTP monkey patch to add the sentry-trace header must run after OTel's monkey-patch that adds an HTTP span)
  • Sentry::Rack::CaptureExceptions must be before OpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware in the middleware stack (the hub and scope must be set up before OTel creates any spans)
  • Since Sentry spans are started and stopped in reaction to OTel spans starting and stopping, the durations reported might be slightly different. (This should be fixable by overriding the timestamps on the Sentry spans we create to match OTel's).
  • In the case where an OTel root span has a child span that starts after the root span has finished, this will be represented in Sentry as two transactions (one for the original root span, and one for the late child).

For users who already use OpenTelemetry instrumentation or would prefer to use
it instead of Sentry's own tracing API, provide a means to disable Sentry's
automatic trace instrumentation and a span processor that exports OpenTelemetry
spans to Sentry.

Usage:

```ruby
Sentry.init do |config|
  # ...

  # Enable trace collection
  config.traces_sample_rate = 1.0  # or use traces_sampler

  # Disable automatic span instrumentation
  config.auto_trace_instrumentation = false
end

OpenTelemetry::SDK.configure do |c|
  # ...

  c.add_span_processor(Sentry::OpenTelemetry::SpanProcessor.new)
end
```

Setting `auto_instrument_traces` to `false` in `Sentry.init` stops Sentry
from automatically creating any transactions or spans. (Tracing must still be
enabled with `traces_sample_rate` or `traces_sampler`.)

`Sentry::OpenTelemetry::SpanProcessor` is notified whenever an OTel span is
started or stopped. In reaction, it starts or stops a Sentry transaction (if
the span has no parent) or span. When the transaction is finished it is picked
up and processed by Sentry's worker thread like any other transaction would be.

If the child service uses the `Sentry::Rack::CaptureExceptions` middleware (see
order caveat below), then its transactions will be linked inside Sentry with
the parent service's transaction.

Caveats:

sentry-rails is not yet supported (haven't looked into it yet).

`Sentry.init` must be run before `OpenTelemetry::SDK.configure` (Sentry's
`Net::HTTP` monkey patch to add the `sentry-trace` header must run after OTel's
monkey-patch that adds an HTTP span)

`Sentry::Rack::CaptureExceptions` must be before
`OpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware` in the
middleware stack (the hub and scope must be set up before OTel creates any
spans)

Since Sentry spans are started and stopped in reaction to OTel spans starting
and stopping, the durations reported might be slightly different. (This should
be fixable by overriding the timestamps on the Sentry spans we create to match
OTel's).

In the case where an OTel root span has a child span that starts after the root
span has finished, this will be represented in Sentry as two transactions (one
for the original root span, and one for the late child).
@sl0thentr0py sl0thentr0py self-requested a review August 26, 2022 20:05
@sl0thentr0py sl0thentr0py self-assigned this Aug 26, 2022
@dcramer
Copy link
Member

dcramer commented Aug 26, 2022

Are span processors common in otel sdks? aka could we do this everywhere?

@AbhiPrasad
Copy link
Member

AbhiPrasad commented Aug 26, 2022

It's a part of the core spec - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#span-processor, and supported by every major SDK.

@github-actions

This comment was marked as outdated.

@vladanpaunovic vladanpaunovic mentioned this pull request Sep 30, 2022
1 task
@vladanpaunovic vladanpaunovic linked an issue Sep 30, 2022 that may be closed by this pull request
1 task
@sl0thentr0py
Copy link
Member

ok I'm taking this up now

so let's talk about this auto_instrument_traces stuff first.

Setting auto_instrument_traces to false in Sentry.init stops Sentry from automatically creating any transactions or spans.

Currently only net/http and redis span creation is disabled by this flag. What do we want from this flag? Do we want to turn off ALL instrumentation by sentry if otel is present? To prevent otel/sentry from stepping on each others' toes, we should just no-op most of the performance part of the SDK (except sampling) and that's a bit more work. If that's what we want, I can think of what the best way to do that is.

@AbhiPrasad
Copy link
Member

Ideally we should turn off all automatic instrumentation that Sentry does with this flag - but still keep user's manual instrumentation. The reason I say this is because if we turn off all instrumentation, then we'll have API issues because we need to access startTransaction inside the collector itself.

Turning off all instrumentation is maybe cleaner though, because we can just instruct folks to switch to OpenTelemetry usage completely. Perhaps it's easier to start with turning off all automatic instrumentation? And then we can see how that looks/feels?

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2022

Codecov Report

Base: 98.33% // Head: 98.39% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (8a1c99f) compared to base (5bfdf80).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1876      +/-   ##
==========================================
+ Coverage   98.33%   98.39%   +0.06%     
==========================================
  Files         151      156       +5     
  Lines        9306     9669     +363     
==========================================
+ Hits         9151     9514     +363     
  Misses        155      155              
Impacted Files Coverage Δ
sentry-opentelemetry/lib/sentry-opentelemetry.rb 100.00% <100.00%> (ø)
...entelemetry/lib/sentry/opentelemetry/propagator.rb 100.00% <100.00%> (ø)
...lemetry/lib/sentry/opentelemetry/span_processor.rb 100.00% <100.00%> (ø)
...metry/spec/sentry/opentelemetry/propagator_spec.rb 100.00% <100.00%> (ø)
...y/spec/sentry/opentelemetry/span_processor_spec.rb 100.00% <100.00%> (ø)
...ls/lib/sentry/rails/tracing/abstract_subscriber.rb 74.07% <100.00%> (ø)
...ntry/rails/tracing/action_controller_subscriber.rb 100.00% <100.00%> (ø)
...b/sentry/rails/tracing/active_record_subscriber.rb 100.00% <100.00%> (ø)
.../sentry/rails/tracing/active_storage_subscriber.rb 100.00% <100.00%> (ø)
sentry-ruby/lib/sentry/configuration.rb 98.47% <100.00%> (+0.03%) ⬆️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sl0thentr0py
Copy link
Member

ok this is backwards compat now with a new config.instrumenter option which defaults to :sentry.

@sl0thentr0py sl0thentr0py marked this pull request as ready for review November 18, 2022 14:44
@sl0thentr0py
Copy link
Member

sl0thentr0py commented Nov 18, 2022

@st0012, @AbhiPrasad this is very big now but it's ready to ship I think. We can fix small things and iterate on them later after shipping. Please let me know if you have questions! I will comment on the files themselves to explain some decisions for @st0012.

I still wanna write some integration tests but you guys can start reviewing whenever you have time.


Summary

  • new gem sentry-opentelemetry to be used as described here
  • all sentry instrumentation will be disabled by a new config.instrumeneter option so that sentry and otel do not step on each other's toes
  • the 2 main things are
    • a SpanProcessor that hooks into on_start/on_finish whenever OpenTelemetry starts/finishes a span. This in turn creates sentry transactions/spans as necessary and keeps track of them in an internal span_map hash.
    • a Propagator to make sure distributed tracing and dynamic sampling work by hooking into inject/extract for incoming/outgoing sentry-trace/baggage header management.

Spec

https://develop.sentry.dev/sdk/performance/opentelemetry/

TraceData = Struct.new(:trace_id, :span_id, :parent_span_id, :parent_sampled, :baggage)

class SpanProcessor < ::OpenTelemetry::SDK::Trace::SpanProcessor
include Singleton
Copy link
Member

@sl0thentr0py sl0thentr0py Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a singleton because span_map needs to be accessed in the propagator and I didn't want to add a global variable. This also makes it explicit that there is only one SpanProcessor globally in the whole system.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By globally I assume you mean within the threads of the same process?

When reading the doc, I see

This means that the hub used in onStart is the same hub used in onEnd and it means that hubs fork properly in async contexts.

But in Ruby SDK, every thread would have their own Hub. So I'm not sure if a global SpanProcessor can achieve that when it'll probably always use the main hub.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part of the spec is outdated, since we no longer use scope to do anything, it shouldn't matter.
@AbhiPrasad

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup the spec is outdated, will fix in develop!


sentry_parent_span = @span_map[trace_data.parent_span_id] if trace_data.parent_span_id

sentry_span = if sentry_parent_span
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the root span from otel becomes a transaction and all other child spans become child spans of the transaction in sentry

span_context = ::OpenTelemetry::Trace::SpanContext.new(
trace_id: [trace_id].pack('H*'),
span_id: [span_id].pack('H*'),
# we simulate a sampled trace on the otel side and leave the sampling to sentry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling the transaction is left to sentry, so we enforce sampling on the otel side, we might iterate on this later for better sampling compatibility with otel.

Copy link
Collaborator

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with OpenTelemetry so I'm learning as I review this and have some questions:

  • Is the goal to let users use opentelemetry-ruby but send the events to Sentry? If that's true, I assume we want them to opt-in to one side completely? e.g. Not tracing some parts with open-telemetry and the rest with Sentry's current tracing APIs/integrations.
  • What concurrency model does opentelemetry-ruby support and does it fit with our a Hub/thread design?
  • The PR description says ... disable Sentry's automatic trace instrumentation and a span processor that exports OpenTelemetry spans to Sentry.
    • What'd happen if some Sentry's automatic traces slip in?

Suggestions:

  • We probably want to add something in sentry-opentelemetry to enforce/warn users about

    The SDK is initialized before the OpenTelemetry SDK is initialized.

  • I feel a helper like Sentry.with_native_tracing? could be helpful. Like in sentry-rails's railties, we can just NOT register tracing subscribers if it's false. Then we won't need to touch the subscribers' logic.

  • Most of the changes in sentry-ruby could be extracted to a separate PR.

@@ -92,7 +93,9 @@ def start_transaction(transaction: nil, custom_sampling_context: {}, **options)
transaction
end

def with_child_span(**attributes, &block)
def with_child_span(instrumenter: :sentry, **attributes, &block)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could users use both instrumenters (sentry and otel) at the same time? Or it's simpler like: if users chose otel, they'll give up sentry tracing completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, it's one or the other. I don't know what direction we will take on this on the future, depends on a lot of parameters like adoption/future of SDKs etc

TraceData = Struct.new(:trace_id, :span_id, :parent_span_id, :parent_sampled, :baggage)

class SpanProcessor < ::OpenTelemetry::SDK::Trace::SpanProcessor
include Singleton
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By globally I assume you mean within the threads of the same process?

When reading the doc, I see

This means that the hub used in onStart is the same hub used in onEnd and it means that hubs fork properly in async contexts.

But in Ruby SDK, every thread would have their own Hub. So I'm not sure if a global SpanProcessor can achieve that when it'll probably always use the main hub.

@sl0thentr0py
Copy link
Member

sl0thentr0py commented Nov 22, 2022

Is the goal to let users use opentelemetry-ruby but send the events to Sentry?

Yes, exactly, we want to make it easier for users using otel to switch over to sentry for performance while continuing to use sentry for errors.

If that's true, I assume we want them to opt-in to one side completely? e.g. Not tracing some parts with open-telemetry and the rest with Sentry's current tracing APIs/integrations.

Yes, this is what the instrumenter option does, it's a switch between either sentry/otel. We might generalize this later depending on adoption but for now, this is what we will ship.

What concurrency model does opentelemetry-ruby support and does it fit with our a Hub/thread design?

The initial design was using scope but we removed that so our hub/scope management should not matter here anymore.
All concurrency handling in otel is done by the Context and we just rely on them to do the right thing.

What'd happen if some Sentry's automatic traces slip in?

It's fine, not the worst. This is very much an MVP/experiment so we'll just fix it if someone reports a bug.
I've tried to make sure that doesn't happen though.

Most of the changes in sentry-ruby could be extracted to a separate PR.

Apologies for the size, we were iterating on this a fair bit so I wasn't sure what shape it'd finally take.
I'll split this up into smaller PRs later today. 👍

@sl0thentr0py sl0thentr0py marked this pull request as draft November 22, 2022 13:14
@sl0thentr0py
Copy link
Member

closing in favour of split PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ruby SDK basic OTEL Support
9 participants