Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rack Span not available in Rack::BodyProxy #341

Closed
arielvalentin opened this issue Feb 17, 2023 · 10 comments · Fixed by #342
Closed

Rack Span not available in Rack::BodyProxy #341

arielvalentin opened this issue Feb 17, 2023 · 10 comments · Fixed by #342
Assignees
Labels
feature New feature or request instrumentation

Comments

@arielvalentin
Copy link
Collaborator

arielvalentin commented Feb 17, 2023

Rack::BodyProxy defers executing a user specified block until after the request is complete, however the Rack instrumentation middleware closes the span before the block is evaluated and then detaches the current context.

When users try to access the current span in the block, Rack returns an invalid span. In the snippet below I was able to reproduce the issue using RSpec as a test harness:

require "spec_helper"

RSpec.describe "Rack::BodyProxy Integration" do
  include Rack::Test::Methods

  let(:app) do
    Rack::Builder.new {
      use OpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware
      run lambda { |_arg|
        puts "app.call #{OpenTelemetry::Instrumentation::Rack.current_span.context.hex_trace_id}"
        body = Rack::BodyProxy.new([]) { puts "BodyProxy.close #{OpenTelemetry::Instrumentation::Rack.current_span.context.hex_trace_id}" }
        [
          200,
          { "Content-Type" => "application/json" },
          body,
        ]
      }
    }
  end

  it "does something" do
    get "/"
    expect(last_response).to be_ok
  end
end

In the mix of the spec output you can see that the trace id is that of an INVALID_SPAN:

Rack::BodyProxy Integration
app.call 65a385eada410a721a79431171960814
BodyProxy.close 00000000000000000000000000000000
  does something

Finished in 0.08231 seconds (files took 2.31 seconds to load)
1 example, 0 failures

I am unsure what the correct behavior should be here but this is related since it is basically my problem where we are unable to correlate Rack Requests to Traces when using Rack::BodyProxy or any of its descendants. DataDog/dd-trace-rb#1746

@arielvalentin
Copy link
Collaborator Author

Is using Rack::Events the best way to handle this going forward? https://www.rubydoc.info/gems/rack/Rack/Events

Order

on_start is called on the handlers in the order that they were passed to the constructor. on_commit, on_send, on_finish, and on_errorare called in the reverse order.on_finishhandlers are called inside anensureblock, so they are guaranteed to be called even if something raises an exception. If something raises an exception in aon_finish` method, then nothing is guaranteed.

@ahayworth
Copy link
Contributor

I didn't know about this before today; it's interesting. (To be fair, the "things I don't know" vastly outnumber the things I do know. 😆 )

I think Rack::Events might be a sensible way to do it, and it looks like it's been in Rack since 2.0 (or, around 2015-2016). I think it would be safe to rely on it if we wanted. However, I don't know if we'd be able to support precisely the same features as our current rack instrumentation. The documentation for this explicitly calls out:

  ### This middleware provides hooks to certain places in the request /
  # response lifecycle.  This is so that middleware that don't need to filter
  # the response data can safely leave it alone and not have to send messages
  # down the traditional "rack stack".

... and we do allow things like logging some response headers. But it may be worth investigating; and it may be that we can start / finish spans inside these hooks and then attempt to update the span from inside middleware like we do today.

@arielvalentin
Copy link
Collaborator Author

I think we need both!

So far in my initial spike the events hooks are doing exactly what we expect.

on_start creates a recording span

on_finish closes the span.

on_error records exceptions!

We can keep a in stack middleware that adds headers to the response.

@ahayworth
Copy link
Contributor

That's exciting, then! The only things that come to mind now are:

  • This may change the overall duration of rack spans; perhaps they may look a little longer than they used to. I don't know that this is actually a bad thing, though.
  • I have a vague, likely-unfounded and thoroughly-unresearched concern that we may end up annotating the wrong span somehow from the middleware. But... that's probably me just being a worrywart for no reason.
  • The "frontend" span stuff... I never really understood precisely what that was all about, but we probably don't want to break it. 😆

@arielvalentin
Copy link
Collaborator Author

The frontend span is a stand-in for adding spans for proxies, LB, or webservers like nginx that may not have instrumentation.

That way you can visualize queue time using X-Request-Start offset by the current time.

@arielvalentin
Copy link
Collaborator Author

A little spike... this gets us the bear minimum.

class MyEvents
  include Rack::Events::Abstract
  def on_start(request, response)
    extracted_context = OpenTelemetry.propagation.extract(
      request.env,
      getter: OpenTelemetry::Common::Propagation.rack_env_getter
    )
    span = OpenTelemetry.tracer_provider.tracer("rack").start_span("HTTP #{request.request_method}}", with_parent: extracted_context, kind: :server)
    ctx = OpenTelemetry::Trace.context_with_span(span)
    request.env["otel.context.token"] = OpenTelemetry::Context.attach(ctx)
  end

  def on_finish(request, response)
    span = OpenTelemetry::Trace.current_span
    span.finish
    OpenTelemetry::Context.detach(request.env["otel.context.token"])
  end
end

Something I do not quite understand about our current implementation is why Rack has its own Context Key for Current Span and why its sets its own Current context instance instead of using the default Context key.

Do you know anything about that?

@arielvalentin
Copy link
Collaborator Author

OK after some spelunking I got the gist of it.

The same object instance is being added to multiple Context keys that happen to have the same name current-span, this is a convenience method so that other instrumentations are able to look up the Rack Span instance since it may be different from the current span context.

I will need to preserve that behavior for sure.

@arielvalentin
Copy link
Collaborator Author

arielvalentin commented Feb 17, 2023

I believe this is the final incantation:

# frozen_string_literal: true

require "spec_helper"

class MyEvents
  include Rack::Events::Abstract
  def on_start(request, response)
    extracted_context = OpenTelemetry.propagation.extract(
      request.env,
      getter: OpenTelemetry::Common::Propagation.rack_env_getter
    )
    span = OpenTelemetry::Instrumentation::Rack::Instrumentation.instance.tracer.start_span("HTTP #{request.request_method}}", with_parent: extracted_context, kind: :server)
    ctx = OpenTelemetry::Trace.context_with_span(span)
    rack_ctx = OpenTelemetry::Instrumentation::Rack.context_with_span(span, parent_context: ctx)
    request.env["otel.context"] = OpenTelemetry::Context.attach(ctx)
    request.env["otel.context.rack_token"] = OpenTelemetry::Context.attach(rack_ctx)
    request.env["otel.span"] = span
  end

  def on_error(request, response, e)
    request.env["otel.span"].record_exception(e)
    request.env["otel.span"].status = OpenTelemetry::Trace::Status.error("Server Error")
    puts "on_error"
  end

  def on_finish(request, response)
    puts "on_finish"
    begin request.env["otel.span"].finish; rescue nil; end
    begin OpenTelemetry::Context.detach(request.env["otel.context.token"]); rescue nil; end
    begin OpenTelemetry::Context.detach(request.env["otel.context.rack_token"]); rescue nil; end
  end
end

RSpec.describe "Rack::BodyProxy Integration" do
  include Rack::Test::Methods

  before do
    OpenTelemetry::SDK.configure
  end

  context "good" do
    let(:app) do
      Rack::Builder.new {
        use Rack::Events, [MyEvents.new]
        run lambda { |_arg|
          puts "#{OpenTelemetry::Trace.current_span.context.hex_trace_id} #{OpenTelemetry::Trace.current_span.context.hex_span_id} recording? #{OpenTelemetry::Trace.current_span.recording?} | before"

          OpenTelemetry.tracer_provider.tracer("rack").in_span("test") do |span|
            puts "#{span.context.hex_trace_id} #{span.context.hex_span_id} recording? #{span.recording?} | action explicit span"
            puts "#{OpenTelemetry::Trace.current_span.context.hex_trace_id} #{OpenTelemetry::Trace.current_span.context.hex_span_id} recording? #{OpenTelemetry::Trace.current_span.recording?} | action implicit span"

            body = Rack::BodyProxy.new([]) do
              puts "#{span.context.hex_trace_id} #{span.context.hex_span_id} recording? #{span.recording?} | BodyProxy.close closure"
              puts "#{OpenTelemetry::Trace.current_span.instance_variable_get("@parent_span_id").unpack1('H*')} | BodyProxy.close closure parent_span_id "
              puts "#{OpenTelemetry::Instrumentation::Rack.current_span.instance_variable_get("@parent_span_id").unpack1('H*')} | BodyProxy.close rack parent_span_id "

              puts "#{OpenTelemetry::Trace.current_span.instance_variable_get("@parent_span_id").unpack1('H*')} | BodyProxy.close parent_span_id "
              puts "#{OpenTelemetry::Trace.current_span.context.hex_trace_id} #{OpenTelemetry::Trace.current_span.context.hex_span_id} recording? #{OpenTelemetry::Trace.current_span.recording?} | BodyProxy.close Trace.current_span "
              puts "#{OpenTelemetry::Instrumentation::Rack.current_span.context.hex_trace_id} #{OpenTelemetry::Instrumentation::Rack.current_span.context.hex_span_id} recording? #{OpenTelemetry::Instrumentation::Rack.current_span.recording?} | BodyProxy.close Rack.current_span"
            end

            [
              200,
              {
                "Content-Type" => "application/json",
              },
              body,
            ]
          end
        }
      }
    end

    it "does something" do
      get "/", {}, { "traceparent" => "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" }
      expect(last_response).to be_ok
    end
  end

  context "error" do
    let(:app) do
      Rack::Builder.new {
        use Rack::Events, [MyEvents.new]
        run lambda { |_arg|
          puts "#{OpenTelemetry::Trace.current_span.context.hex_trace_id} #{OpenTelemetry::Trace.current_span.context.hex_span_id} recording? #{OpenTelemetry::Trace.current_span.recording?} | before"

          OpenTelemetry.tracer_provider.tracer("rack").in_span("test") do |span|
            puts "#{span.context.hex_trace_id} #{span.context.hex_span_id} recording? #{span.recording?} | action explicit span"
            puts "#{OpenTelemetry::Trace.current_span.context.hex_trace_id} #{OpenTelemetry::Trace.current_span.context.hex_span_id} recording? #{OpenTelemetry::Trace.current_span.recording?} | action implicit span"

            body = Rack::BodyProxy.new([]) do
              puts "#{span.context.hex_trace_id} #{span.context.hex_span_id} recording? #{span.recording?} | BodyProxy.close closure"
              puts "#{OpenTelemetry::Trace.current_span.instance_variable_get("@parent_span_id").unpack1('H*')} | BodyProxy.close closure parent_span_id "
              puts "#{OpenTelemetry::Instrumentation::Rack.current_span.instance_variable_get("@parent_span_id").unpack1('H*')} | BodyProxy.close rack parent_span_id "

              puts "#{OpenTelemetry::Trace.current_span.instance_variable_get("@parent_span_id").unpack1('H*')} | BodyProxy.close parent_span_id "
              puts "#{OpenTelemetry::Trace.current_span.context.hex_trace_id} #{OpenTelemetry::Trace.current_span.context.hex_span_id} recording? #{OpenTelemetry::Trace.current_span.recording?} | BodyProxy.close Trace.current_span "
              puts "#{OpenTelemetry::Instrumentation::Rack.current_span.context.hex_trace_id} #{OpenTelemetry::Instrumentation::Rack.current_span.context.hex_span_id} recording? #{OpenTelemetry::Instrumentation::Rack.current_span.recording?} | BodyProxy.close Rack.current_span"
            end

            raise "server error"
          end
        }
      }
    end

    it "does something" do
      expect { get "/", {}, { "traceparent" => "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" } }.to raise_error("server error")
    end
  end
end
Rack::BodyProxy Integration
  good
0af7651916cd43dd8448eb211c80319c c3270cce54cc3d17 recording? true | before
0af7651916cd43dd8448eb211c80319c 3ab95871c3a7dd1f recording? true | action explicit span
0af7651916cd43dd8448eb211c80319c 3ab95871c3a7dd1f recording? true | action implicit span
0af7651916cd43dd8448eb211c80319c 3ab95871c3a7dd1f recording? false | BodyProxy.close closure
b7ad6b7169203331 | BodyProxy.close closure parent_span_id 
b7ad6b7169203331 | BodyProxy.close rack parent_span_id 
b7ad6b7169203331 | BodyProxy.close parent_span_id 
0af7651916cd43dd8448eb211c80319c c3270cce54cc3d17 recording? true | BodyProxy.close Trace.current_span 
0af7651916cd43dd8448eb211c80319c c3270cce54cc3d17 recording? true | BodyProxy.close Rack.current_span
on_finish
    does something
  error
0af7651916cd43dd8448eb211c80319c fa06bef232c82f05 recording? true | before
0af7651916cd43dd8448eb211c80319c 1ba37d3d6fda74f4 recording? true | action explicit span
0af7651916cd43dd8448eb211c80319c 1ba37d3d6fda74f4 recording? true | action implicit span
on_error
on_finish
    does something

Finished in 0.10305 seconds (files took 2.45 seconds to load)
2 examples, 0 failures

@robbkidd
Copy link
Member

  1. I see a context stored at request.env["otel.context"] but then detached with request.env["otel.context.token"]. Those hash keys ought to be the same, right?

  2. Detach goes in reverse order from the attaches/creations, yea? So should it be …

    request.env["otel.context.token"] = OpenTelemetry::Context.attach(ctx)
    request.env["otel.context.rack_token"] = OpenTelemetry::Context.attach(rack_ctx)

    begin OpenTelemetry::Context.detach(request.env["otel.context.rack_token"]); rescue nil; end
    begin OpenTelemetry::Context.detach(request.env["otel.context.token"]); rescue nil; end

@arielvalentin
Copy link
Collaborator Author

@robbkidd I think I am going to go a simpler route... something like this...

  def on_start(req, _)
  # code
    request.env[TOKENS_KEY] = [OpenTelemetry::Context.attach(ctx), OpenTelemetry::Context.attach(rack_ctx)]
  end


  def on_finish(req,resp)
     # some more code 
     request.env[TOKENS_KEY]&.each do |token|
          OpenTelemetry::Context.detach(token)
     rescue StandardError => e
          OpenTelemetry.handle_error(message: 'Unable to detach Rack Context', exception: e)
     end
  end

arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this issue Feb 18, 2023
This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`.

Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces.

The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline.

The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`.

This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads.

Fixes open-telemetry#341
Related DataDog/dd-trace-rb#1746
@arielvalentin arielvalentin self-assigned this Feb 28, 2023
@arielvalentin arielvalentin added feature New feature or request instrumentation labels Feb 28, 2023
arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this issue Mar 2, 2023
This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`.

Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces.

The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline.

The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`.

This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads.

Fixes open-telemetry#341
Related DataDog/dd-trace-rb#1746
arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this issue Mar 5, 2023
This PR makes it possible for users to toggle between using the Rack Events and Middleware based instrumentation

See open-telemetry#341
Depends on open-telemetry#342
arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this issue Mar 5, 2023
This PR makes it possible for users to toggle between using the Rack Events and Middleware based instrumentation

See open-telemetry#341
Depends on open-telemetry#342
arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this issue Mar 5, 2023
This PR makes it possible for users to toggle between using the Rack Events and Middleware based instrumentation

See open-telemetry#341
Depends on open-telemetry#342
arielvalentin added a commit that referenced this issue Apr 5, 2023
* feat: Use Rack::Events for instrumentation

This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`.

Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces.

The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline.

The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`.

This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads.

Fixes #341
Related DataDog/dd-trace-rb#1746

* squash: additional feature parity

* squash: add allowed response headers

* squash: url quantization

* squash: Now with new Lemon Scented response headers

* squash: we are now at parity

* squash: use instrumetation config

* squash: Use declarative config options

* squash: fix bad refactoring

* convert proxy span to an event

* refactor: move configurations to instrumentation install

* squash: add test converage

* squash: make response headers a little more resilient

* squash: Ensures event middleware will not cause the application to crash when it encounters errors

* squash: fix linter error

* feat: Add middleware args helper for ActionPack and Sinatra

* fix: test case

* fix: Rack Events is autoloaded so if the parent module is present so are submodules

* fix: More precise error handling

* fix: Ensure config is cleared/setup during installation

* fix: Sinatra 1.4 compatability

* fix: bad merge

* fix: invalid responses in test case

* squash: Added a few more test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request instrumentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants