Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sinatra tracing not working with 0.35+ #1035

Closed
mscrivo opened this issue May 8, 2020 · 30 comments · Fixed by #1034
Closed

Sinatra tracing not working with 0.35+ #1035

mscrivo opened this issue May 8, 2020 · 30 comments · Fixed by #1034
Assignees
Labels
bug Involves a bug community Was opened by a community member integrations Involves tracing integrations
Milestone

Comments

@mscrivo
Copy link
Contributor

mscrivo commented May 8, 2020

We currently have sequel tracing working with the default service_name of "postgres", but want to split it out so that we have separate reporting on each of our 3 databases. I'm trying to following the guide and use:

Datadog.configure do |conf|
  conf.use :sequel
end

db = Sequel.connect('postgres://user:password@host:port/main')
replica = Sequel.connect('postgres://user:password@host:port/replica')

Datadog.configure(db, service_name: 'primary-db')
Datadog.configure(replica, service_name: 'replica-db')

and when I enable the test adapter and watch the traces locally, I can see the traces have "service=primary-db" for example, but when we deploy to prod, the traces are missing postgres and these new service names altogether.

I'm not sure what I'm missing or doing wrong?

Using version 0.35.1

Also, how does one separate databases when you use Sequel's sharding mechanism and don't actually create a separate connection for the replica as described here?

@mscrivo
Copy link
Contributor Author

mscrivo commented May 8, 2020

hmm .. I just tried with 0.34.2 and it seems to be working, perhaps a regression?

@delner
Copy link
Contributor

delner commented May 8, 2020

Thanks for the report @mscrivo! I'll try to take a look at this soon, see if I can replicate and understand why this isn't working. Are you able to replicate this in an environment where you can safely test it?

@mscrivo
Copy link
Contributor Author

mscrivo commented May 9, 2020

thanks @delner, yeah I'm able to deploy a single pod and watch how it's working.

@delner
Copy link
Contributor

delner commented May 12, 2020

@mscrivo So I did some testing for this on 0.35.0 and another branch I had based on 0.35.2.

The spec is as follows:

it do
  Datadog.configure { |c| c.use :sequel }

  sequel = Sequel.sqlite(':memory:').tap do |db|
    db.create_table(:table) do
      String :name
    end
  end

  sequel_two = Sequel.sqlite(':memory:').tap do |db|
    db.create_table(:table) do
      String :name
    end
  end

  Datadog.configure(sequel, service_name: 'one')
  Datadog.configure(sequel_two, service_name: 'two')

  @traces = []
  allow(Datadog.tracer.writer).to receive(:write) do |trace|
    @traces << trace
  end

  sequel[:table].insert(name: 'data1')
  sequel_two[:table].insert(name: 'data2')

  @traces = @traces.flatten
  expect(@traces.first.service).to eq('one')
  expect(@traces.last.service).to eq('two')
end

This test fails on 0.35.0, because it would appear the Sequel integration was eagerly assigning a tracer before it was replaced... I think we fixed this in #1034. I then tried this on #1037 (which is effectively 0.35.2 plus some changes) and this seemed to work just fine. So I think the latest release should have fixed this.

On your test pod, can you try 0.35.2? (Maybe #1037 too if 0.35.2 doesn't work.) Let me know if the problem is still present.

@delner delner self-assigned this May 12, 2020
@delner delner added bug Involves a bug community Was opened by a community member integrations Involves tracing integrations labels May 12, 2020
@delner delner linked a pull request May 12, 2020 that will close this issue
@mscrivo
Copy link
Contributor Author

mscrivo commented May 12, 2020

Thanks for looking into that @delner ... for whatever reason, after I upgrade to 0.35.2, I'm not getting any traces at all. Going back to 0.34.2 makes them all come back. I'm still trying to figure out why, perhaps it's something wrong with our config, but I'm not sure yet. The logs on the pod don't seem to indicate any problem, so it's a bit of mystery.

@delner
Copy link
Contributor

delner commented May 12, 2020

Does your tracer point to a non-default host or port? If you reconfigured the tracer in this way, this was the scenario that had the potential to cause the bug prior to 0.35.2. Basically the instrumentation was using the wrong tracer, and it would be good to confirm that's still not the case in your application.

Simple way to check this is the object IDs of the tracer after all your configuration is complete.

# These values should be identical
Datadog.tracer.object_id
Datadog.configuration[:sequel][:tracer].object_id
Datadog::Pin.get_from(sequel).object_id # Your Sequel database object

If they are different objects, then that would be a concern.

@mscrivo
Copy link
Contributor Author

mscrivo commented May 12, 2020

The first 2 are identical, the last one is different.

Here's how we have our config setup:

TRACING_ENABLED = ENV['Prod'] && !ENV['ENABLE_DATADOG_TRACER'].nil?

Datadog.configure do |conf|
  if TRACING_ENABLED
    conf.use :aws
    conf.use :http
    conf.use :rack
    conf.use :redis
    conf.use :sidekiq
    conf.use :sinatra
    conf.use :sequel
  end
end

Datadog.tracer.configure(
  enabled: TRACING_ENABLED,
  hostname: ENV['DATADOG_APM_AGENT_SERVICE_SERVICE_HOST'] || 'localhost',
  sampler: Datadog::RateSampler.new(0.5),
  priority_sampling: true,
  port: 8126,
)

# Then to be able to identify primary vs replica calls, we're patching sequel's synchronize like this:

def synchronize(server=nil)
  if !Datadog.tracer.enabled
    return synchronize_without_timing(server) {|conn| yield conn}
  end

  # Setup databases in datadog so we know which one queries are being run on.
  if server == :replica
    Datadog.configure(DB, service_name: 'postgres/replica')
  else
    Datadog.configure(DB, service_name: 'postgres/primary')
  end

  span = Datadog.tracer.trace('sequel.synchronize')
  span.service = DATADOG_SERVICE
  span.resource = 'sequel.synchronize'
  span.span_type = Datadog::Ext::AppTypes::CUSTOM
  finished = false

  begin
    @pool.hold(server || :default) do |conn|
      span.finish(Time.now)
      finished = true
      yield conn
    end
  rescue => e
    span.finish(Time.now) if !finished
    raise e
  end
end

@delner
Copy link
Contributor

delner commented May 12, 2020

Okay, so from what I can tell, the Datadog::Pin on your Sequel database having a different tracer than Datadog.tracer is likely the problem. If we fix that, I think we fix the problem.

I'd would suggest trying to rewrite your configuration a bit. Remove the Datadog.tracer.configure section (which only configures an instance of a tracer), and replace it with the following in Datadog.configure:

Datadog.configure do |c|
  c.tracer.enabled = TRACING_ENABLED
  c.tracer.hostname = ENV['DATADOG_APM_AGENT_SERVICE_SERVICE_HOST'] || 'localhost'
  c.tracer.sampler = Datadog::RateSampler.new(0.5)
  c.tracer.priority_sampling = true
  c.tracer.port = 8126

  # Activate your integrations here...
end

@mscrivo
Copy link
Contributor Author

mscrivo commented May 13, 2020

Even with your suggested configuration change, I'm unable to get our tracing going with 0.35.2. Anything else I could look at to debug?

@delner
Copy link
Contributor

delner commented May 13, 2020

@mscrivo It's going to be hard to suggest a clear solution without being able to reproduce the problem on my end. If you can reproduce the issue in an RSpec test or sample app that I can run on my end, then I should be able to debug it for you.

Otherwise, I think you might have to poke around a bit at the internals of your application with break points or log statements. If you can identify why Datadog::Pin.get_from(sequel).object_id is different than Datadog.tracer.object_id, then I think we will solve the problem.

If I were debugging it, I'd look closely at how the application initializes Datadog::Tracer during its startup phase, to see what's causing trace instances to initialize, and which tracer instances it ends up using for Sequel tracing. A simple way of doing this would be to add print/debug statements to Datadog::Tracer#initialize to see what object_id it initialized with, then add a print/debug statement to Datadog::Tracer#trace to see what tracer instance object_id is producing traces.

My suspicion is the Sequel instrumentation is somehow using a stale tracer instance with default trace settings thus causing traces to be swallowed. Some things to keep in mind:

  • Datadog.configure(object, settings) will create a Datadog::Pin on the object in question.
  • Datadog::Pin#new calls Datadog.tracer and caches the resulting tracer onto the Datadog::Pin
  • Calling Datadog.tracer or other components before Datadog.configure runs will cause the tracer to initialize with default settings.
  • When Datadog.configure is passed a block, upon completing the block, it will rebuild Datadog.tracer and set it to a new instance of Datadog::Tracer with the configuration settings. This can cause Datadog.tracer.object_id to change over the lifecycle of an application, and may play a role in this issue.

So it could be because you're calling Datadog.configure(sequel, service_name: 'my-service') before Datadog.configure in the initializer file runs causing it to cache an old tracer instance. But it's hard for me to confirm that without seeing the code in action. I would investigate this first.

@mscrivo
Copy link
Contributor Author

mscrivo commented May 13, 2020

thanks @delner that's super helpful .. feel free to close this issue, I'll report back when I figure out what it was.

@delner
Copy link
Contributor

delner commented May 13, 2020

Hey @mscrivo I have one more thing you could try, in the spirit of my last comment. #1041 makes the tracer on the pin "lazy", which I think could make an impact here. Can you try this PR, see if it helps?

@mscrivo
Copy link
Contributor Author

mscrivo commented May 13, 2020

Would love to try, is there an easy way to do that without having a gem available?

Also, I tried what you suggested earlier and Datadog::Tracer#initialize printed 1 object_id when the first integration was configured, and a different one when the configure block completed. Is that expected?

Inside of Datadog::Tracer#trace it always printed the same object_id, the one that was created when the configure block finished, it seems.

Also, I should note that I'm not getting any tracing, not just the sequel stuff .. I even removed the Datadog.configure(sequel, service_name: 'my-service') and still no luck.

btw, the way I'm trying to see the traces in DD dashboard is by env. We set DD_ENV as an env var in our app pods, did anything happen to change with that in this release?

@delner
Copy link
Contributor

delner commented May 13, 2020

@mscrivo Yes, you can try it with:

source 'http://gems.datadoghq.com/prerelease' do
  gem 'ddtrace', '0.35.2.fix.pin.use.current.tracer.64656'
end

Also, I tried what you suggested earlier and Datadog::Tracer#initialize printed 1 object_id when the first integration was configured, and a different one when the configure block completed. Is that expected?

Yes this is a known issue. It's because when the first integration is configured, it's applying the patch to the integration which causes Datadog.logger to be invoked, which causes all trace components (including the tracer) to initialize with default settings. I've addressed it in #1037 which is in master and this PR should have those changes as well.

Overall though, I don't think it should be harmful if the integration is using Datadog.tracer and that tracer is configured properly.

If you're in a safe environment, you can also try activating:

Datadog.configure do |c|
  c.diagnostics.debug = true
end

Which will produce log output for every span in every trace produced by the application. I recommend against running it in any production or other sensitive environments because it is extremely verbose and can easily overwhelm logs. It's best to try this in an test environment where you can control the load yourself.

btw, the way I'm trying to see the traces in DD dashboard is by env. We set DD_ENV as an env var in our app pods, did anything happen to change with that in this release?

This is probably all tied in with same problem; trace settings are not being used because the tracer is likely misconfigured or the correct tracer is not being used. If you can see #trace is being invoked on Datadog.tracer, then I would inspect that tracer instance and look at its state. You'd want to look at its writer to see that it has your agent settings, and its tags to see env has been set.

@mscrivo
Copy link
Contributor Author

mscrivo commented May 13, 2020

Trying it locally with the test adapter on and debug on, all I see is these over and over:

1 unfinished spans:
D, [2020-05-13T15:37:13.339849 #98243] DEBUG -- ddtrace: [ddtrace] (/Users/mscrivo/proj/vendor/bundle/ruby/2.5.0/gems/ddtrace-0.35.2.fix.pin.use.current.tracer.64656/lib/ddtrace/context.rb:133:in `block (2 levels) in close_span') unfinished span: Span(name:sinatra.request,sid:3869509896388384557,tid:6389607625795167708,pid:8148550737466063632)
D, [2020-05-13T15:37:13.340857 #98243] DEBUG -- ddtrace: [ddtrace] (/Users/mscrivo/proj/vendor/bundle/ruby/2.5.0/gems/ddtrace-0.35.2.fix.pin.use.current.tracer.64656/lib/ddtrace/context.rb:129:in `block in close_span') root span redis.command closed but has 1 unfinished spans:
D, [2020-05-13T15:37:13.340990 #98243] DEBUG -- ddtrace: [ddtrace] (/Users/mscrivo/proj/vendor/bundle/ruby/2.5.0/gems/ddtrace-0.35.2.fix.pin.use.current.tracer.64656/lib/ddtrace/context.rb:133:in `block (2 levels) in close_span') unfinished span: Span(name:sinatra.request,sid:1300474489609260202,tid:4850733079752190718,pid:6636370913950261182)

the only traces that seem to work are these:

D, [2020-05-13T15:37:13.781321 #98243] DEBUG -- ddtrace: [ddtrace] (/Users/mscrivo/proj/vendor/bundle/ruby/2.5.0/gems/ddtrace-0.35.2.fix.pin.use.current.tracer.64656/lib/ddtrace/tracer.rb:344:in `write') [
 Name: js.view_manager_init
Span ID: 1593462590364861097
Parent ID: 0
Trace ID: 3114477366909342577
Type:
Service: browser
Resource: js.view_manager_init
Error: 0
Start: 1589398630304197120
End: 1589398633780861952
Duration: 3476665000
Allocations: 36
Tags: [
   page => dashboard]
Metrics: [
   _sample_rate => 0.5,
   _dd.agent_psr => 1.0,
   system.pid => 98243.0,
   _sampling_priority_v1 => 1.0]]

@delner
Copy link
Contributor

delner commented May 13, 2020

Ah, okay! Unfinished spans will cause the entire trace to drop, so that would explain things. Looks like the Sinatra span isn't closing properly. This is might be related to #1015 which was released in 0.35.0.

@marcotc any thoughts on this?

@marcotc
Copy link
Member

marcotc commented May 13, 2020

@delner @mscrivo Taking a look at this soon! I'll report back.

@marcotc
Copy link
Member

marcotc commented May 14, 2020

@mscrivo I just wanted to update you that we found an issue with the new Sinatra integration, when used with another Sinatra/Rack middleware that is also instrumented by ddtrace. We are still figuring out the best way to solve it but it's a great start.

Despite that, I wasn't able to reproduce the error as originally reported. I want to make sure our changes actually fix your issue, so I'd like to ask you for a few pieces of information regarding your application.

  1. Could you provide us with the base code that you use to set up your Sinatra application? No need to provide the code for the request handlers themselves, nor the route names.
    Depending if you are using a modular or classic setup, this could be your class MyApp < Sinatra::Base class, alongside any #register and #use statements you have, to help us know what plugins and middlewares are registered. I want to be able to set up a "blank" Sinatra app just like yours in order debug this issue.
  2. What version of Sinatra are you running?

@marcotc marcotc self-assigned this May 14, 2020
@mscrivo
Copy link
Contributor Author

mscrivo commented May 15, 2020

class Base < Sinatra::Base
  # Configuration for all environments (prod/dev/test).
  configure do
    set :root, File.expand_path('..', __dir__)
    use Rack::Parser
  end

  # Track exceptions with rollbar in production.
  configure :production do
    # Rollbar configuration takes place in config/rollbar.rb
    use Rollbar::Middleware::Sinatra
  end

  register Sinatra::Hashfix

  include Helpers

  # Setup debug probes (as middleware)
  use Controllers::DebugProbes

  # Log passed args to the logs/[environment].log file.
  def self.log(*args)
    if !@logger
      environment = ENV['RACK_ENV'] || 'development'
      log_path = File.expand_path("../../logs/#{environment}.log", __FILE__)
      @logger = ::Logger.new(log_path)
    end

    @logger.info(*args)
  end
end

We're using Sinatra 2.0.3

@mscrivo mscrivo changed the title Sequel multiple database tracing not working Sinatra tracing not working with 0.35+ May 21, 2020
@mscrivo
Copy link
Contributor Author

mscrivo commented May 21, 2020

I'm trying to upgrade our Sinatra/Rack versions to see if that helps at all .. but there are breaking change to our app, so it might take some time.

@mscrivo
Copy link
Contributor Author

mscrivo commented May 26, 2020

Sinatra 2.0.8.1/Rack 2.1.2 upgrade complete. Unfortunately, that did not fix the issue, still getting no traces.

@mscrivo
Copy link
Contributor Author

mscrivo commented May 26, 2020

@delner I went back to 0.34.2 for now, but wanted to keep the configuration changes you suggested:

Datadog.configure do |c|
  c.tracer.enabled = TRACING_ENABLED
  c.tracer.hostname = ENV['DATADOG_APM_AGENT_SERVICE_SERVICE_HOST'] || 'localhost'
  c.tracer.sampler = Datadog::RateSampler.new(0.5)
  c.tracer.priority_sampling = true
  c.tracer.port = 8126

  c.use :aws
  c.use :http
  c.use :rack
  c.use :redis
  c.use :sidekiq
  c.use :sinatra
  c.use :sequel
end

but when running with 0.34.2, it complains on startup with:

NoMethodError: undefined method `hostname=' for #<Datadog::Tracer:0x0000000118c3bea0>
  /config/datadog-config.rb:11:in `block in <main>'
  /vendor/bundle/ruby/2.5.0/gems/ddtrace-0.34.2/lib/ddtrace/configuration.rb:15:in `configure'
  /config/datadog-config.rb:7:in `<main>'

I'm not sure what's going on here ... but I'm pretty sure it's from my lack of Ruby knowledge as I only have about 3 months of working experience in Ruby. Any help you can provide would be greatly appreciated.

@marcotc
Copy link
Member

marcotc commented May 26, 2020

Hey @mscrivo, because your are using 0.34.2, you'll have to use the old configuration syntax: https://github.com/DataDog/dd-trace-rb/blob/v0.34.2/docs/GettingStarted.md#advanced-configuration

Here's how it would look like:

Datadog.configure do |c|
  c.tracer enabled: TRACING_ENABLED,
           hostname: ENV['DATADOG_APM_AGENT_SERVICE_SERVICE_HOST'] || 'localhost',
           sampler: Datadog::RateSampler.new(0.5),
           priority_sampling: true,
           port: 8126

  c.use :aws
  c.use :http
  c.use :rack
  c.use :redis
  c.use :sidekiq
  c.use :sinatra
  c.use :sequel
end

Let us know if this configuration still doesn't work for you.

@mscrivo
Copy link
Contributor Author

mscrivo commented May 26, 2020

🤦 Thanks that worked!

@mscrivo
Copy link
Contributor Author

mscrivo commented Jun 26, 2020

Just to circle back on this, I tried 0.37 to no avail. So went back over to docs to see if I was missing anything, as 0.34 and lower work perfectly fine with our code. So I saw that for the new modular sinatra apps, you're supposed to add the middleware. So I added:

 register Datadog::Contrib::Sinatra::Tracer

right below:

class Base < Sinatra::Base

in our main app base class. Note: we already had the rack middleware setup using:

use Datadog::Contrib::Rack::TraceMiddleware

I also moved our datadog config setup before any of the routes are defined, as they weren't before. And now I am getting traces, but they look messed up, sort of double counted like this:

Screen Shot 2020-06-26 at 12 56 54 PM

I can't figure out what I'm missing or doing wrong with the new version

@delner
Copy link
Contributor

delner commented Jun 27, 2020

@mscrivo I don't think 0.37.0 included a fix for this but it is on our radar. @marcotc had been looking quite a bit into it and probably has some insight into it; we'll follow up on this soon.

@mscrivo
Copy link
Contributor Author

mscrivo commented Jun 28, 2020

Ah ok, thanks @delner ... good to know, I was convinced it was something I was doing wrong.

@marcotc
Copy link
Member

marcotc commented Sep 8, 2020

👋 @mscrivo, we've shipped changes that should fix the issues in this ticket (#1145) as part of 0.40.0.

Let us know if it addresses the issues you are seeing!

@mscrivo
Copy link
Contributor Author

mscrivo commented Sep 8, 2020

👋 @mscrivo, we've shipped changes that should fix the issues in this ticket (#1145) as part of 0.40.0.

Let us know if it addresses the issues you are seeing!

It does seem to be working properly again, thanks so much @marcotc!

@marcotc
Copy link
Member

marcotc commented Sep 9, 2020

That's awesome to hear! I'll close this issue as resolved.
But please let us know if the issue persists.

@marcotc marcotc closed this as completed Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Involves a bug community Was opened by a community member integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants