Migrate Rails instrumentation to use only ActiveSupport::Notifications #218

ahayworth · 2022-12-06T17:59:43Z

We recently uncovered an issue wherein our action_pack instrumentation caused a production problem. There are many ways to consider such an issue, but one way is to acknowledge that monkey-patching is fraught with peril, and that we should consider different methods of instrumentation.

For Rails, we should be able to use ActiveSupport::Notifications. We already use this to great effect in our action_view instrumentation, and we think we can expand this further to the rest of our Rails instrumentation.

To do this, we need to:

The text was updated successfully, but these errors were encountered:

ahayworth · 2022-12-06T18:10:29Z

This is the PR that improved safety in Rails 7: rails/rails#43282

plantfansam · 2022-12-06T21:33:28Z

I think this is a great idea. Even if we discover issues with ASN, that sounds like a great reason to contribute fixes upstream (to Rails) 😄

richardmcmillen · 2022-12-08T04:29:00Z

This is cool to see!

I noted in my MR here #204 that the ActionPack instrumentation could probably be refactored to use the same subscriber as I created in the ME. Just in case anyone wants to have a look.

robbkidd · 2022-12-13T17:45:32Z

Talked about this during the SIG today. Attendees are fans of the idea to move away from monkeypatching and towards AS::N subscriptions.

tjefferson08 · 2022-12-23T19:38:30Z

Can AS::N support current_span ?

One thing I do a lot in my apps:

# in a controller
def index
  OpenTelemetry::Trace.current_span.add_attributes(interesting_stuff)
  # ... rest of action
end

I'm not super familiar with AS::N, but I'd think deferring span creation to after the execution of index would break this use case.

arielvalentin · 2022-12-27T04:39:22Z

@ahayworth I started to look into this a little bit today and noticed a few constraints and dependencies that will make it difficult to use the existing AS Notifications instrumentation outside of a Rails application.

The SpanSubscriber has a director dependency on Rails and assumes it is installed (introduced here open-telemetry/opentelemetry-ruby#993):

https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/active_support/lib/opentelemetry/instrumentation/active_support/span_subscriber.rb#L33

Some applications, e.g. the GitHub Monolith, do not load the entire Rails environment at startup and in some cases only loads a subset of ActiveSupport, ActiveJob or ActiveRecord. We currently do not use ActiveSupport instrumentation in the monolith but we do use the current ActiveRecord instrumentation.

We would need to make changes to the existing AS Notifications gem in order to remove this dependency on Rails.

This makes it possible to use this instrumentation in non-Rails applications that leverage AS::Notifications. In addition to that it mitigates a race condition where a subscription may be registered by a separate thread before obtaining a lock on the notifications object. See open-telemetry#218

arielvalentin · 2022-12-28T17:28:31Z

I've opened this PR, which should address issues with using AS Notifications outside of Rails applications

#242

arielvalentin · 2023-01-12T17:59:32Z

cc: #269 #268

This makes it possible to use this instrumentation in non-Rails applications that leverage AS::Notifications. In addition to that it mitigates a race condition where a subscription may be registered by a separate thread before obtaining a lock on the notifications object. See open-telemetry#218

* fix: Drop Rails dependency for ActiveSupport Instrumentation This makes it possible to use this instrumentation in non-Rails applications that leverage AS::Notifications. In addition to that it mitigates a race condition where a subscription may be registered by a separate thread before obtaining a lock on the notifications object. See #218 * chore: Update linter rule for Metrics/MethodLength In most cases long methods are going to be OK since we do not want to minimize the amount of method dispatches we have to make in our instrumentations. * fix: Linter errors * squash: linter fix * squash: fix whitespace

ahayworth · 2023-01-17T17:23:00Z

Apologies for disappearing for awhile on this - I took a break from all work for a long time. On the upside, I'm fully refreshed and ready to work on things. 😄

Some applications, e.g. the GitHub Monolith, do not load the entire Rails environment at startup

@arielvalentin thank you for making changes to support this use case! It's exceedingly uncommon (as you and I both know), but an important one nonetheless. ❤️

Can AS::N support current_span ?

@tjefferson08 It depends ™️ primarily on when the AS::Notification is created. Many things in rails do something like this (really inaccurate pseudocode, apologies):

inside_notification(name: blah) do
  do_some_work
end

In that case, accessing current_span should work well. However, if the notification is generated completely after the thing it instruments, then accessing current_span would not work as expected. We'll need to figure out how widespread each variant is within Rails, and I don't know the answer to that off the top of my head. I'll add it to the todo list.

ahayworth · 2023-01-24T20:48:13Z

@arielvalentin I took a look through our instrumentation, mapping things to AS::N calls. Basically, active_record is really the sticky one - I'm not sure how we'd be able to get things like User#find with precisely the same semantics as we have before.

I'm curious what you think, take a look at the updated checklist with some notes. Maybe this is something we should talk to upstream about?

arielvalentin · 2023-01-24T23:53:19Z

What did dd-trace-rb beelines, NewRelic, or signal-fx do?

arielvalentin · 2023-01-26T14:36:47Z

cc: #293

ahayworth · 2023-01-27T13:04:04Z

It looks like they just subscribe to the two events - instantiate and sql - and create pretty basic spans around them. They don't create different span names depending on the different methods called (#create vs #destroy, etc) like we do: https://github.com/DataDog/dd-trace-rb/tree/a7c8aa5d81b67d136665dfb86d2345b783fb2290/lib/datadog/tracing/contrib/active_record

arielvalentin · 2023-01-27T13:16:43Z

I think that's ok. Low cardinality spans names and high cardinality tags I think is the better way to go.

How do others feel? @open-telemetry/ruby-contrib-approvers

arielvalentin · 2023-02-05T18:11:12Z

Looking at the notification events I see some interesting things at least for ActiveRecord:

https://guides.rubyonrails.org/active_support_instrumentation.html#active-record

Key	Value	OTel	Description
:sql	SQL statement	`db.statement`	Optional since we probably favor DB Driver
:name	Name of the operation	`span.name`	per the example it likes like to would be `Post Load`
:connection	Connection object	`net.*`	we can extract attributes about the DB itself
:binds	Bind parameters	N/A
:type_casted_binds	Typecasted bind parameters	N/A
:statement_name	SQL Statement name	?	Unclear if this would be helpful to record. Is this the Prepared Statement Name?
:cached	true is added when cached queries used	? `sql.cached` `rails.activerecord.query.cached` ?	Unclear if this should be custom

ericmustin · 2023-02-06T12:09:50Z

@ahayworth @arielvalentin wrt span name not having the same semantics in ddtracerb I think what’s worth understanding is that the ddtracerb software works tightly coupled to the datadog agent, where a combination of obfuscation+Normalization occurs on the sql statement (attached in datadogs tracing system as span.resource ). This creates the same behavior of “low cardinality span resource high cardinality span tags” as opentelemetry’s “low cardinality span name high cardinality span attributes” approach, but since opentelemetry lacks any standard sql obfuscation+normalization, and we don’t take that approach currently anyway for span name, @ahayworth is correct that migrating to this style of instrumentation would effectively be a breaking change to our instrumentation output (although not according to semver2.0 since pre 1.0 software offers no such guarantees). Whether or not it’s worth that break, idk. Perhaps we could introduce this gracefully, so users could stay on the old style of instrumentation, or we could use the 1.0 as an opportunity to change span naming convention.

arielvalentin · 2023-02-06T13:42:48Z

Perhaps the best way to move forward is to catalog ActiveSupport notifications and create a mapping guide, which may help better inform our decisions:

https://docs.google.com/spreadsheets/d/1T7uYBewovl6YGo0uAsvKLLSWO0z_m6-rRJJtzoPb264/edit

Here's my 2 cents:

I'm personally not very concerned about the breaking chances for span names, especially in cases like where the names deviate from the specification, which I think will be a welcomed change (e.g. ActionPack).

I'm in favor of ripping the bandaid off and skipping adding backward comparability for these. If folks want the old span names and instrumentation I'd advocate that we tell them to pin to an earlier version.

ahayworth · 2023-02-15T02:05:10Z

@arielvalentin I wasn't able to edit that spreadsheet, so I made a copy and updated it here (you should have edit access through your work account): https://docs.google.com/spreadsheets/d/1mObwGGdJtmeG2YCWCQxWEwwpBEOA-3EXb0WT83bWc8I/edit#gid=379636133

I haven't mapped any additional types yet, but I did finish pulling over all the events and info from the rails guides. Click the little expandy-thing by each group to see them all. I'll try to do some mapping tomorrow! 😄

github-actions · 2023-04-27T01:52:36Z

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.

arielvalentin · 2023-05-05T18:38:53Z

👋🏼 @jhawthorn The OTel Ruby instrumentation currently relies on monkey patching Rails libraries and I am going to start to work on migrating them to use Active Support Notifications.

While investigating next steps for this implementation I saw that this PR rails/rails#43390 removes start/finish methods from the Subscriber class, which is what we currently use in some instrumentations to know when to start and finish a span.

Here is an example of a custom subscriber that relies on those template methods: https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/racecar/lib/opentelemetry/instrumentation/racecar/process_message_subscriber.rb

This is important because it allows us propagate the parent span context so that when we create additional child spans they are properly linked.

If I understand the changes correctly there will be backward compatibility so we will be able to continue to use the start and stop methods after Rails 7.1, is that correct?

Is there a plan to drop support for the those lifecycle methods in favor of something else?

arielvalentin · 2023-09-02T16:36:40Z

Hello friends: @robbkidd @robbkidd @chrisholmes @ahayworth

I started a bit of an internal refactoring (#641) to see what it would take to separate the logic of payload handling and the subscriber registration logic so that users would be able to customize the logic around creating spans, setting span attributes or adding events, etc... however this refactoring has left me with a more questions than answers.

Why did we choose to set the span name to this instead of using the name of the event given to the subscriber? name.split('.')[0..1].reverse.join(' ').freeze

Did we ever intend on supporting regular expressions for the subscriber like AS::Notifications supports?

Did we ever intend on adding support for Span Links or did we implicitly only ever want to use Parent-Child?

The payload does not include the context object and instead has a token. Is the only way for children to access the context is to use the Thread Local Current Context?

Do we envision a scenario ever where AS::Notifications switches to out of thread processing making the current thread context stack invalid?

arielvalentin · 2023-12-17T14:43:24Z

We have started moving forward with the work here and have completed the migration for active_job and action_pack.

Here is what we have come up with so far:

Prefer Semantic Conventions whenever possible
Use the Rails defined Event Name when semantic conventions do not exist, i.e no name.split('.')[0..1].reverse.join(' ').freeze

arielvalentin · 2023-12-17T15:08:41Z

@jhawthorn @bensheldon @composerinteralia I started looking into switching ActiveRecord instrumentation to use sql.active_record notifications but wanted to highlight what @ahayworth points out in the description of this issue:

sql.active_record is great, but it's low-level and won't capture whether we are doing a #find or a #destroy, etc. Nor will it capture callbacks.

The current instrumentation patches AR persistence and query methods that add non-semconv internal spans but as a result they capture time spend in callbacks.

Do you think Rails upstream would be amenable to adding notifications around method AS methods so we are able to capture some callback timings?

If not, then I think we have to stick with patching AR and use sql.active_record to enrich spans with information about whether or not the query was cached or if it was performed async.

The lower level DB drivers are the ones performing sanitization for non-AR users but we could potentially have some cost savings here if the bind parameters are able to give us a hint that the query is already sanitized and used that in lieu of obfuscating the db query.

Some bug reports we received about our current instrumentation come from the fact that some gems patch AR incorrectly and change the method signature of the AR public API. I know it's not something that we should worry about, but I also do not want the instrumentation to fall behind changes in Rails that would result in the same problem.

I am considering updating the instrumentation to use argument forwarding instead of redefining method signatures:

         # This is the current definition in the gem 
          def decrement!(attribute, by = 1, touch: nil)
            tracer.in_span("#{self.class}#decrement!") do
              super
            end
          end

         # using forwarding to mitigate method signature changes
          def decrement!(...)
            tracer.in_span("#{self.class}#decrement!") do
              super
            end
          end

Taking that a step further I am considering proposing lower cardinality span names with semconv attributes:

         # using forwarding to mitigate method signature changes
          def decrement!(...)
            tracer.in_span("active_record#decrement!", attributes: { "code.namespace" => self.class.name }) do
              super
            end
          end

What are y'alls thoughts on this?

jhawthorn · 2024-01-08T22:28:22Z

sql.active_record is great, but it's low-level and won't capture whether we are doing a #find or a #destroy, etc. Nor will it capture callbacks.

payload[:name] captures these two. It like won't report 1:1 exactly what otel previously reported, but that may be a good thing (to avoid user confusion in having different things logged in development vs reported in production). There may be some missing (ex. decrement/decrement are reported as "update" which is correct, but maybe not as specific as desired) I'd be happy to help filling in those gaps.

>> ActiveSupport::Notifications.subscribe("sql.active_record") {|_,_,_,_,payload| p payload[:name] }
>> User.first
"User Load"
>> user = User.create!
"TRANSACTION"
"User Create"
"TRANSACTION"
>> user.destroy!
"TRANSACTION"
"User Destroy"
"TRANSACTION"

arielvalentin · 2024-04-03T13:25:07Z

payload[:name] captures these two. It like won't report 1:1 exactly what otel previously reported, but that may be a good thing (to avoid user confusion in having different things logged in development vs reported in production).
There may be some missing (ex. decrement/decrement are reported as "update" which is correct, but maybe not as specific as desired) I'd be happy to help filling in those gaps.

Then in these cases using the name key may be enough for naming spans.

What about measuring time spent in AR callbacks? AFAICT sql.active_record timing does not include that.

olepbr · 2024-11-20T13:58:03Z

I believe #1258 is another example of "monkey-patching is fraught with peril" =)

ahayworth added feature New feature or request help wanted Extra attention is needed instrumentation labels Dec 6, 2022

ahayworth self-assigned this Dec 6, 2022

arielvalentin mentioned this issue Dec 27, 2022

fix: Drop Rails dependency for ActiveSupport Instrumentation #242

Merged

arielvalentin mentioned this issue Jan 12, 2023

ActiveRecord persistence class methods patch signature is too specific #269

Closed

ahayworth mentioned this issue Jan 27, 2023

Explore additional Rails instrumentation #298

Open

github-actions bot added the stale Marks an issue/PR stale label Apr 27, 2023

arielvalentin added keep Ensures stale-bot keeps this issue/PR open and removed stale Marks an issue/PR stale labels May 5, 2023

arielvalentin mentioned this issue Sep 2, 2023

refactor: Extract custom span subscriber logic #641

Closed

arielvalentin mentioned this issue Oct 11, 2023

feat!(active_job): Use ActiveSupport instead of patches #677

Merged

xuan-cao-swi mentioned this issue Oct 19, 2023

feat!(action_pack): Use ActiveSupport instead of patches #703

Merged

xuan-cao-swi mentioned this issue Sep 27, 2024

Change in behavior in activerecord from 6.1 to 7.0+ #1183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Rails instrumentation to use only ActiveSupport::Notifications #218

Migrate Rails instrumentation to use only ActiveSupport::Notifications #218

ahayworth commented Dec 6, 2022 •

edited

Loading

ahayworth commented Dec 6, 2022

plantfansam commented Dec 6, 2022 •

edited

Loading

richardmcmillen commented Dec 8, 2022

robbkidd commented Dec 13, 2022

tjefferson08 commented Dec 23, 2022

arielvalentin commented Dec 27, 2022

arielvalentin commented Dec 28, 2022

arielvalentin commented Jan 12, 2023

ahayworth commented Jan 17, 2023

ahayworth commented Jan 24, 2023

arielvalentin commented Jan 24, 2023 •

edited

Loading

arielvalentin commented Jan 26, 2023

ahayworth commented Jan 27, 2023

arielvalentin commented Jan 27, 2023

arielvalentin commented Feb 5, 2023

ericmustin commented Feb 6, 2023

arielvalentin commented Feb 6, 2023

ahayworth commented Feb 15, 2023

github-actions bot commented Apr 27, 2023

arielvalentin commented May 5, 2023

arielvalentin commented Sep 2, 2023

arielvalentin commented Dec 17, 2023

arielvalentin commented Dec 17, 2023 •

edited

Loading

jhawthorn commented Jan 8, 2024

arielvalentin commented Apr 3, 2024

olepbr commented Nov 20, 2024

Migrate Rails instrumentation to use only ActiveSupport::Notifications #218

Migrate Rails instrumentation to use only ActiveSupport::Notifications #218

Comments

ahayworth commented Dec 6, 2022 • edited Loading

ahayworth commented Dec 6, 2022

plantfansam commented Dec 6, 2022 • edited Loading

richardmcmillen commented Dec 8, 2022

robbkidd commented Dec 13, 2022

tjefferson08 commented Dec 23, 2022

arielvalentin commented Dec 27, 2022

arielvalentin commented Dec 28, 2022

arielvalentin commented Jan 12, 2023

ahayworth commented Jan 17, 2023

ahayworth commented Jan 24, 2023

arielvalentin commented Jan 24, 2023 • edited Loading

arielvalentin commented Jan 26, 2023

ahayworth commented Jan 27, 2023

arielvalentin commented Jan 27, 2023

arielvalentin commented Feb 5, 2023

ericmustin commented Feb 6, 2023

arielvalentin commented Feb 6, 2023

ahayworth commented Feb 15, 2023

github-actions bot commented Apr 27, 2023

arielvalentin commented May 5, 2023

arielvalentin commented Sep 2, 2023

arielvalentin commented Dec 17, 2023

arielvalentin commented Dec 17, 2023 • edited Loading

jhawthorn commented Jan 8, 2024

arielvalentin commented Apr 3, 2024

olepbr commented Nov 20, 2024

ahayworth commented Dec 6, 2022 •

edited

Loading

plantfansam commented Dec 6, 2022 •

edited

Loading

arielvalentin commented Jan 24, 2023 •

edited

Loading

arielvalentin commented Dec 17, 2023 •

edited

Loading