-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: instrumentation for racecar #72
feat: instrumentation for racecar #72
Conversation
Thanks @chrisholmes! Are you set up with a tracing UI locally? If so, would it be possible to include a screenshot of a |
@plantfansam yep, makes sense. Here's the view in jaeger: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this submission!
I am curious about the choice to mixin the instrumentation separately of the processor object vs mixing into the built in instrumenter1 or decorating process
, and process_batch
.
Would you mind sharing your thought process behind the overall design?
Footnotes
MINIMUM_VERSION = Gem::Version.new('2.0') | ||
|
||
compatible do | ||
Gem.loaded_specs['racecar'].version >= MINIMUM_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using Gem.loaded_specs
whenever possible and reference the VERSION
specified in the gem itself.
Some background into why: https://github.com/open-telemetry/opentelemetry-ruby/issues/988#issuecomment-1018676853
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let me know if I'd done what you'd expect.
I've tried a few different ways, but this form (though it is quite ugly) is the only format I can get the tests to pass so its spans are consistent to the other kafka instrumentations. This is my understanding of the impact of the methods you've mentioned: When patching As far as I can tell, it is not fully possible to adapt Racecar's instrumentation interface as the What do you think about this? Is it worth omitting batch message links or errors for a simpler looking instrumentation? |
assigned myself to this so i can remember to review while ariel is out, apols on the delay here chris. |
2c9c897
to
27d8ff9
Compare
059bcb2
to
c0611b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good.
There are a few TODOs you have that still need to be completed,
The only real blocker to me is it's not clear to me why we need to take rdkafka
as a hard dependency and install the instrumentation, I'd like to understand this better and whether it would be possible to decouple them. If not, I'd like to make sure we respect any env var's that disable the rdkafka instrumentation as well as make sure there's no edge cases in version requirements.
Nice work as always, apols on the delay on review
instrumentation/racecar/Appraisals
Outdated
## TODO: Include the supported version to be tested here. | ||
## Example: | ||
# appraise 'rack-2.1' do | ||
# gem 'rack', '~> 2.1.2' | ||
# end | ||
|
||
# appraise 'rack-2.0' do | ||
# gem 'rack', '2.0.8' | ||
# end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted, this needs to be updated to run appraisals for racecar
instrumentation/racecar/lib/opentelemetry/instrumentation/racecar/instrumentation.rb
Outdated
Show resolved
Hide resolved
end | ||
|
||
install do |_config| | ||
OpenTelemetry::Instrumentation::Rdkafka::Instrumentation.instance.install({}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. So is the idea that the racecar instrumentation depends on the rdkafka instrumentation?
- Is it possible to check if the instrumentation is already installed before attempting to re-install it?
- What's the desired behavior if the rdkafka instrumentation has been intentionally disabled by the user via env var? Are we essentially overwriting that choice.
2a. Hypothetical since there's no config options at the moment, but won't we automatically overwriting any configuration the user has supplied for theirrdkafka
instrumentation should it be aded in the future?
I guess i have to think a bit more closely about what the "right" behavior should be, but imo we should be respecting if this instrumentation has been disabled and, imo, its not clear to me why we need to take rdkafka as a dependancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my intention here was to rely on the Rdkafka implementation of producer spans as adding a Racecar implementation would duplicate them like this:
racecar send span:
rdkafka send span:
consumer span
There is a similar dependency between the Rack instrumentation with Rails and Sinatra so I did assume it would be alright to follow that pattern, but it is being used slightly different here and less extensively.
- Is it possible to check if the instrumentation is already installed before attempting to re-install it?
Yes, there is an interface we could use.
- What's the desired behavior if the rdkafka instrumentation has been intentionally disabled by the user via env var? Are we essentially overwriting that choice.
2a. Hypothetical since there's no config options at the moment, but won't we automatically overwriting any configuration the user has supplied for theirrdkafka
instrumentation should it be aded in the future?
Yes, I agree, there probably needs to be some more consideration here. If we don't want to load rdkafka's instrumentation then we should probably load some instrumentation here for producer spans.
As alternatives to this, I can think of the following options:
- Duplicate the spans (there is precedence for this with Faraday and it wrapping of other clients like net/http)
- Have the racecar producer instrumentation used
untraced
so that the rdkafka producer spans are dropped (I think this would break context propagation so it's probably not a good idea).
What do you think?
|
||
spec.add_dependency 'opentelemetry-api', '~> 1.0' | ||
spec.add_dependency 'opentelemetry-instrumentation-base', '~> 0.21.0' | ||
spec.add_dependency 'opentelemetry-instrumentation-rdkafka', '~> 0.2.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ~> 0.2.0
? This is lower than the min version of our rdkafka instrumentation (0.10.0)
Line 12 in 26a9b3b
MINIMUM_VERSION = Gem::Version.new('0.10.0') |
Should we be making the rdkafka instrumentation more permissive? As mentioned before, currently the coupling between these two instrumentations feels unclear and maybe a little bit unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why it's like this actually. I'll update it depending on the conversation above.
673b5e7
to
a2395b6
Compare
hey @chrisholmes just ack'ing your followups, plan to catch up here otw. hectic week for me there's been some internal stuff going on at work. one thing we discussed in the SIG mtg on tuesday was moving this approach to use ASN notifications. I think everyone agreed that if it could be accomplished, it would be preferrable. I'm going to try to see if i can't fix the issues you'd mentioned when you tried the approach, if it is manageable to fix, this would probably be the preferred way forward. |
66a0332
to
00db269
Compare
thanks @ericmustin!
Sure, I'm happy for you to try. I should have a working copy somewhere with my experiment too that I can push to a branch though I did it by implementing a customer Racer::Instrumenter rather than hoooking into ASN. For reference, the only real issue I had was with the Racecar's runner not providing batch message headers to the instrumentation layer. If we're happy to omit links for batch messages then it should be a simply implementation. |
@chrisholmes Thank you again for your patience and for being willing to try a different approach here. How would you feel about submitting an upstream PR to see if @dschierbeck or @bquorning would be willing to include the batch message headers so we are able to use links? |
Co-authored-by: Eric Mustin <[email protected]>
Rather than counting the messages received by the tests consumer the tests will now wait count for an expected number of spans received at the exporter. This has an equivalent outcome, but simplifies the contents of the consumers.
d9819eb
to
ae6dee4
Compare
@@ -50,6 +50,7 @@ Gem::Specification.new do |spec| | |||
spec.add_dependency 'opentelemetry-instrumentation-net_http', '~> 0.21.0' | |||
spec.add_dependency 'opentelemetry-instrumentation-pg', '~> 0.22.0' | |||
spec.add_dependency 'opentelemetry-instrumentation-que', '~> 0.4.0' | |||
spec.add_dependency 'opentelemetry-instrumentation-racecar', '~> 0.1.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but a thought: perhaps worth seeing how this performs in production circumstances before adding to instrumentation-all
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering something like that. Has this happened before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but maybe @open-telemetry/ruby-contrib-maintainers might know? I don't think there's any problem with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally speaking I'm ok with adding it. If folks that use all don't want it they should disable it.
I minor bump with the change log/Release notes should be enough of a signal
headers ||= {} | ||
|
||
tracer.in_span("#{topic} send", attributes: attributes, kind: :producer) do | ||
OpenTelemetry.propagation.inject(headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting the reliance on headers
interface, which is documented in rdkafka
here.
end | ||
|
||
def start(_name, _id, payload) | ||
attrs = attributes(payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but I don't think attributes
is called anywhere else, so we could just keep this method in start
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this introduces a ABC rubocop failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. We can always enable/disable the cop but I don't think it's worth the time 😄
def start(_name, _id, payload) | ||
attrs = attributes(payload) | ||
|
||
parent_context = OpenTelemetry.propagation.extract(payload[:headers], getter: OpenTelemetry::Common::Propagation.symbol_key_getter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
til symbol_key_getter
def add_subscribers | ||
require_relative 'process_message_subscriber' | ||
subscriber = ProcessMessageSubscriber.new | ||
::ActiveSupport::Notifications.subscribe('process_message.racecar', subscriber) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link to what emits this?
Racecar::Instrumentation.instance.tracer | ||
end | ||
|
||
def start(_name, _id, payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we're satisfying the ActiveSupport::Subscriber
interface here. If so, can we explicitly say so in a comment? IDK if it makes sense to subclass the Subscriber
class, but that would perhaps be another way of achieving this.
attributes | ||
end | ||
|
||
def finish(name, id, payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this instrumentation is adapted from the ActiveSupport
instrumentation's SpanSubscriber
(link), so maybe these questions were answered over there. At the risk of being told to RTFM, are we sure that:
start
andfinish
will be called sequentially, with all context.attach calls matched with a context.detach call.start
andfinish
will be invoked by the sameThread
.
If user code attaches to a new context between start
and finish
, but neglects to detach from that context, then detach
will not work as expected. Of course, they'd see an error in their logs, so ... that'll probably help them fix it.
Similarly, if finish
is called in a different thread than start
, it'll be using the wrong Context stack, since that's stored on Thread.current
.
I don't know much about ActiveSupport notifications (or Rdkafka! or Racecar!), so I'm most likely being paranoid here, but thought it was worth asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahayworth might have insight, since he worked on a similar thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, it's not 100% in manual, but ActiveSupport::Notifications
will call in that order and in the same thread (they leave to subscribers to decide if async is needed). The code isn't terribly complicated so you could follow on the source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to see my open comments resolved at some point, but I don't think it's worth blocking merge. Happy to put this in the wild and see how it fares.
Head branch was pushed to by a user without write access
835f9b3
to
7625877
Compare
@arielvalentin @plantfansam: I've resolved some conflicts and I think that I've switched over to the new CI structure. Please let me know if I've missed anything. |
This PR implements #15
Racecar is built on top of rdkafka, but it doesn't use the API that is implemented by the rdkafka instrumentation so an racecar specific adapter is required.
Racecar's main loop repeatedly calls a rdkafka consumer's
#poll
method and then sends the returned messages to its own consumer classes (aka processors).This instrumentation works by patching the Runner class and then its processor (Racecar::Consumer objects) at initialization.