Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sidekiq.job as a child for the sidekiq.publish span #1273

Open
bolahanna44 opened this issue Nov 29, 2024 · 5 comments
Open

make sidekiq.job as a child for the sidekiq.publish span #1273

bolahanna44 opened this issue Nov 29, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@bolahanna44
Copy link

the context is not being propagated from sidekiq publish to sidekiq job

Share details about your runtime

Operating system details: Linux, Ubuntu 20.04 LTS
RUBY_ENGINE: "ruby"
RUBY_VERSION: "3.3.0"
RUBY_DESCRIPTION: "ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]"

Share a simplified reproduction if possible

initializer/otel.rb

require 'opentelemetry/sdk'
require 'opentelemetry/instrumentation/all'
require 'opentelemetry/exporter/jaeger'

ENV["OTEL_TRACES_EXPORTER"] ||= "console"

OpenTelemetry::SDK.configure do |c|
  c.use_all({
    'OpenTelemetry::Instrumentation::Sidekiq' => {
      span_naming: :job_class,
      propagation_style: :child,
      trace_poller_wait: true,
      trace_poller_enqueue: true,
      trace_processor_process_one: true,
    },
  })
  c.service_name = "test-otel"
  c.add_span_processor(
    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
      # OpenTelemetry::Exporter::Jaeger::AgentExporter.new(
      #   host: 'localhost',
      #   port: '6831'
      # )
      OpenTelemetry::Exporter::Jaeger::CollectorExporter.new(endpoint: 'http://localhost:14268/api/traces')
    )
  )
end

controllers/enqueue_controller.rb

class EnqueueController < ApplicationController
  def create
    MyJob.perform_in(2.seconds, 'bob', 5)
    render plain: OpenTelemetry::Trace.current_span.context.trace_id.unpack1('H*')
  end
end

and here i can see sidekiq publish job
Screenshot 2024-11-28 at 11 37 24 PM

and here is the sidekiq job in a separate trace

Screenshot 2024-11-29 at 2 01 20 AM

my question is: how i can have sidekiq job in the 2nd photo as a child span for sidekiq publish job in the 1st photo ?

@bolahanna44 bolahanna44 added the bug Something isn't working label Nov 29, 2024
@zvkemp
Copy link

zvkemp commented Dec 13, 2024

Is trace propagation otherwise working in your app? If not, have you set a propagator in your config (e.g. OTEL_PROPAGATORS=trace_context)?

@bolahanna55
Copy link

Is trace propagation otherwise working in your app? If not, have you set a propagator in your config (e.g. OTEL_PROPAGATORS=trace_context)?

i realized that trace correlation works whenever i use perform_async in sidekiq, but whenever i use perform_in the sidekiq worker executes in a separate trace

@zvkemp
Copy link

zvkemp commented Dec 13, 2024

It's possible this is a Jaeger issue, if it's an issue at all.

By explicitly delaying the start of the sidekiq job, you are decoupling the request from the job — the start of the job span will most likely be greater than the end of the root span, implying they are not the same trace, so this is probably more appropriately done with a span link. I don't know enough about Jaeger to know whether it treats that situation as invalid, so it might be useful to check whether or not your sidekiq spans actually have the trace context on them (e.g. to narrow down whether it is jaeger or opentelemetry-ruby/sidekiq that is responsible). Can you set OTEL_TRACES_EXPORTER=console and inspect the output?

Copy link
Contributor

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.

@github-actions github-actions bot added the stale Marks an issue/PR stale label Jan 13, 2025
@kaylareopelle kaylareopelle removed the stale Marks an issue/PR stale label Jan 13, 2025
@bolahanna44
Copy link
Author

bolahanna44 commented Jan 16, 2025

It's possible this is a Jaeger issue, if it's an issue at all.

By explicitly delaying the start of the sidekiq job, you are decoupling the request from the job — the start of the job span will most likely be greater than the end of the root span, implying they are not the same trace, so this is probably more appropriately done with a span link. I don't know enough about Jaeger to know whether it treats that situation as invalid, so it might be useful to check whether or not your sidekiq spans actually have the trace context on them (e.g. to narrow down whether it is jaeger or opentelemetry-ruby/sidekiq that is responsible). Can you set OTEL_TRACES_EXPORTER=console and inspect the output?

i tried the newrelic agent, and they have both in the same trace. I tried to add some debug points, and i found out that when the job is being enqueued for the first it get pushed to redis with the correct trace, then redis run the sidekiq producer middleware again when the job runs which overwrite the trace_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants