active record: many spans created for internal operations #35

blumamir · 2022-02-24T14:08:00Z

Description of the bug

This is not actually a bug. I investigate heavy traces with many many active record spans and found the following:

active record instrument the public api functions, few of those are: create, save, transaction.
When create is invoked, active record checks if the argument is an array, and if so, it invokes create for each element in the array:

      def create(attributes = nil, &block)
        if attributes.is_a?(Array)
          attributes.collect { |attr| create(attr, &block) }
        else
          object = new(attributes, &block)
          object.save
          object
        end
      end

Each create invokes the save function on each object, which is also instrumented
There is also I think an invocation of the transaction method which is also instrumented. Could not reproduce it myself but I can see it generated in a some other environment.

For example, the following function invocation:

User.create([{:username => "foo", :password => "1234"}, {:username => "bar", :password => "5678"}])

Creates the following output in Jaeger:

I am thinking about whether it gives good value to instrument the public methods when they are invoked from code inside the active record module itself. This follows the actual code being executed in reality, but create large traces with many spans:

More costs to users
Visual load in UI
More performance impact on user's application
Confusing traces: create span with n children of same operation create with child of save each. This, I suspect, is not intuitive to the user which was only invoking create once. I was confues myself until I looked at the rails source code to make sense of what I am seeing.

I post here this issue to hear opinions and consult if you are open to changing this behavior to only include functions invoked directly by users. This can also be implemented as instrumentation config option I guess.

The text was updated successfully, but these errors were encountered:

ericmustin · 2022-02-28T14:46:53Z

In the example above, it seems like perhaps there's some pg or mysql instrumentation the level under activerecord, which if that's too noisy, can be disabled by simply disabling those db instrumentations.

As far as the behavior around .create vs .save, yes when you pass in multiple hashes in an array into create, it iterates over each one, creates a model for it, attempts to save it, and then returns that model.

.create and .save in active record are not alias methods. While .save makes up a percentage of work within create, their performance profiles would be different, I think the user loses value by dropping those spans and also realistically speaking i'm not sure how changing this behavior to only include functions invoked directly by users. would be reliably determined by just autoinstrumentation. Perhaps some manipulation of baggage or tracestate, or using custom attributes and a custom Sampler, in a user's actual application would make this possible, but nothing that I'm aware of that we could easily support upstream.
I think it's confusing that .create can itself be a wrapper around multiple individual .create invocations. It would be reasonable to me to expose a configuration option like option :trace_create_arrays, default: true, validate: :boolean. If false, instrumentation does not attempt to create a "wrapper" span for create/create! invocations called with an Array attributes arg.

//pseudo code-ish

def create(attributes = nil, &block)
  if !config[:trace_create_arrays] && attributes.is_a?(Array)
    super
  else
    tracer.in_span("#{self}.create") do
      super
    end
  end
end

def create!(attributes = nil, &block)
  if !config[:trace_create_arrays] && attributes.is_a?(Array)
    super
  else
    tracer.in_span("#{self}.create") do
      super
    end
  end
end

private

def config
  ActiveRecord::Instrumentation.instance.config
end

This is an incredibly similar pattern to what we do in Sidekiq. Using configuration to deal with noisy instrumentation that is "technically correct but noisy and not useful for a backend system" is completely fine with me as long as it's got reasonable defaults and is based on a reasonable user use case, which imo this is.

As far as a more general "changing this behavior to only include functions invoked directly by users.", I'm not against having a configuration option for something to this effect, but as i mentioned above, practically speaking i'm not sure how it would be implemented in a robust way.

blumamir · 2022-03-02T14:09:16Z

Thanks eric,

.create and .save in active record are not alias methods. While .save makes up a percentage of work within create, their performance profiles would be different, I think the user loses value by dropping those spans

I agree that it's losing value. If taken to the extreme, we can imagine theoretical instrumentation of each function invocation in the application which will give an amazing performance profile but introduce a big cost and traces that will be harder to visualize. I think that noisy instrumentation should offer some "slim" or "non-verbose" mode which users can hop in to reduce their cost and visual load in the UI (which is important to some of our users).

how changing this behavior to only include functions invoked directly by users. would be reliably determined by just autoinstrumentation. Perhaps some manipulation of baggage or tracestate, or using custom attributes and a custom Sampler, in a user's actual application would make this possible, but nothing that I'm aware of that we could easily support upstream.

As far as a more general "changing this behavior to only include functions invoked directly by users.", I'm not against having a configuration option for something to this effect, but as i mentioned above, practically speaking i'm not sure how it would be implemented in a robust way.

Regarding instrumenting only top-level API calls, this can be achieved by adding a custom key into the context, and skipping instrumentation if this key is already present. we implemented this approch for nodeJS typeorm instrumentation and it's working good so far. I believe this can also be implemented in ruby if I'm not missing anything.

I think it's confusing that .create can itself be a wrapper around multiple individual .create invocations. It would be reasonable to me to expose a configuration option like option :trace_create_arrays, default: true, validate: :boolean. If false, instrumentation does not attempt to create a "wrapper" span for create/create! invocations called with an Array attributes arg.

I imagined only instrumenting the top-level create and suppressing the internal creates invocations per array element. I guess both options are valid but one will potentially generate more spans than the other, with the benefit of more timing info.

Thanks for taking the time to read this issue and respond

ericmustin · 2022-03-02T14:57:43Z

@blumamir that sounds novel and reasonable to me. I think that's a permissible use of context according to a quick reading of the specification. I'd been keen to hear if other folks have concerns about this approach, @open-telemetry/ruby-approvers , if not i'm happy to assign this issue to you or your colleagues, and can happily review a PR.

For posterity (and because i feel like writing excessively long messages 😄), a path I'd prefer not to go down with instrumentation is something you mentioned:

I think that noisy instrumentation should offer some "slim" or "non-verbose" mode which users can hop in to reduce their cost and visual load in the UI (which is important to some of our users).

While I agree that certain instrumentations should be designed with different modes of verbosity (beside sidekiq, another example would be web frameworks which have excessively high numbers of low-value chained middleware spans), I think it's dangerous to begin to write instrumentation that is overly worried about "their cost and visual load in the UI", since UIs are explicitly not a concern of OpenTelemetry, afaik. Personally speaking that mindset is little silly to me because obviously instrumentation doesn't exist in isolation, it exists as part of an observability system, but this is a choice OpenTelemetry has been intentional about and I'm trying to approach these problems as a representative of OpenTelemetry. I'm not sure what the right mental model is when writing instrumentation for opentelemetry, maybe whether the amount of data generated (high volume of traces, high volumes of spans in a trace, and traces which continue ad infinitum, etc) has value exceeded by the cost of having to transform that data into a wire format and send it across the wire. Maybe just "as long as observer-effect isn't noticeable". I'm not sure.

Anyway, tl;dr we're doing similar-ish stuff in other instrumentations already, and if it unblocks end users i'm in favor of it (as an off by default config option). If there are practical issues voiced by others with this approach instead of just theoretical ones, I'm open to hearing them (a part of me wonders if this isn't something a sampler should be handling and not the instrumentation itself), but imo i think this can move to implementation at this point.

fbogsany · 2022-03-02T15:40:04Z

I think it's totally reasonable to use context in this way to suppress noisy instrumentation invoked recursively by the framework that's instrumented. From a support perspective, I think it's useful to default to suppressing the nested AR spans, since noisy traces are generally only useful for debugging a problem once it has been identified, and are generally just annoying (or worse) if things are 👌 .

This suppression should be configured by an option on the AR instrumentation and should only affect the AR instrumentation. That means that if you have DB driver instrumentation enabled, its spans will still be present as direct children of the top-level AR span instead of under the now-elided nested AR spans. If the DB spans are noisy at that point, I'd recommend either a custom sampler or removing the DB instrumentation.

blumamir · 2022-03-02T16:59:31Z

Great iterating over this with you guys, like the brainstorming :)

I think it's dangerous to begin to write instrumentation that is overly worried about "their cost and visual load in the UI", since UIs are explicitly not a concern of OpenTelemetry, afaik

Interesting 🤔 I wonder if we can find a solution for AR that better fits OpenTelemetry, is more generic and requires less code changes. Is it reasonable to use a custom sampler for that?
Actually, if the sampler would get the "instrumnetationLibrary" (or "InstrumentationScope") as a parameter to the should_sample? function, it would be possible to implement this "verbosity reducing logic" as a custom sampler without introducing any change to any instrumentation, right? That way, instrumentations could always be very verbose, and users can configure their own logic to drop some of the spans from the trace which they have no interest in.

ericmustin · 2022-03-02T18:15:06Z

@blumamir I would imagine a custom sampler could handle this (though i'm not 100% sure, i think you'd need to maintain the state that a context key otherwise would somewhere), or alternatively something like the approach you outlined above. Without diving into the code it's hard for me to say for sure, ActiveRecord is probably a bit more complex than whatever ORM this is being modeled after and there are probably some edge cases we're not considering. I think this is good discussion but at this point I'd encourage you to spike on an implementation that you think represents a good start to this work, and push up a PR or branch (can be a draft), and we can discuss this more concretely and work toward getting this behavior supported. If you dont have bandwidth let me know and i can see if i can carve out time.

blumamir · 2022-03-08T10:13:00Z

Thanks very much @ericmustin

At first I planned on adding configuration per noisy instrumentation in order to reduce verbosity. This is what we've done so far in node's instrumentations.
The discussion here encouraged me to look for other, more general and idiomatic alternatives. After some code digging, I believe that the issue can be resolved with a custom sampler, and plan to prototype it and share the result. It might take me some time to do it, as I think InstrumentationLibrary first needs to be added to the sampler in the specification for the solution to work. There is already this issue for that.

I'm working on other features in parallel but will update here once I have news. In the meantime, any thoughts and concerns are welcome on this thread

blumamir added the bug Something isn't working label Feb 24, 2022

ericmustin assigned blumamir Mar 2, 2022

blumamir mentioned this issue Mar 8, 2022

Instrumentation library info and Resources are not available to Samplers open-telemetry/opentelemetry-specification#1588

Open

plantfansam transferred this issue from open-telemetry/opentelemetry-ruby Jun 14, 2022

blumamir mentioned this issue Dec 19, 2022

New component: in-trace sampler open-telemetry/opentelemetry-collector-contrib#17086

Closed

2 tasks

open-telemetry locked and limited conversation to collaborators Dec 29, 2022

arielvalentin converted this issue into discussion #260 Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

active record: many spans created for internal operations #35

active record: many spans created for internal operations #35

blumamir commented Feb 24, 2022

ericmustin commented Feb 28, 2022

blumamir commented Mar 2, 2022

ericmustin commented Mar 2, 2022 •

edited

Loading

fbogsany commented Mar 2, 2022

blumamir commented Mar 2, 2022

ericmustin commented Mar 2, 2022

blumamir commented Mar 8, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

active record: many spans created for internal operations #35

active record: many spans created for internal operations #35

Comments

blumamir commented Feb 24, 2022

ericmustin commented Feb 28, 2022

blumamir commented Mar 2, 2022

ericmustin commented Mar 2, 2022 • edited Loading

fbogsany commented Mar 2, 2022

blumamir commented Mar 2, 2022

ericmustin commented Mar 2, 2022

blumamir commented Mar 8, 2022

This issue was moved to a discussion.

ericmustin commented Mar 2, 2022 •

edited

Loading