-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support gRPC #403
Support gRPC #403
Conversation
Hello, we would need this feature as well. Can someone from the maintainers review it? Much appreciated! |
Hey @adomokos, we're as excited about this pull request as you are, and are actively working with @Jared-Prime on this. Rest assured its coming! |
💥 Thanks @delner! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great so far, some misc feedback.
# inject trace context information into gRPC metadata prior to | ||
# sending the request to the server. | ||
class Client < Base | ||
def trace(keywords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to catch and handle errors here. In the (unlikely) event the tracing code raises an error, the actual gRPC call would not happen. Generally speaking, the goal of trace code like this is to be minimally impactful, so it might be prudent to do some kind of begin
rescue
ensure
else
setup here.
Same applies to the Server
code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain how to proceed with this error handling. Maybe you had something specific in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing:
begin
keywords[:metadata].each do |header, value|
span.set_tag(header, value)
end
Datadog::GRPCPropagator
.inject!(span.context, keywords[:metadata])
rescue StandardError => e
Datadog::Tracer.log.debug("GRPC client trace failed: #{e}")
end
yield
Or something to that effect. It isn't bulletproof exactly without really chopping up the code, but might help. You could also consider extracting that code into another method named something like annotate!
then add the error handling in there, if that seems cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I get it now. I like the annotate!
approach and will try that
# The `#intercept!` method is implemented in gRPC; this module | ||
# will be prepended to the original class, effectively injecting | ||
# our tracing middleware into the head of the call chain. | ||
module InterceptWithDatadog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this constant need to be defined in GRPC::InterceptionContext
? (As opposed to the Datadog::Contrib::GRPC
one?) We generally try to avoid opening/modifying classes or injecting new constants into integration namespaces, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does and is defined here. If you'd like, we can opt for the direct insertion of interceptors as in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand: for the default use :grpc
scenario, we're not sure what RPC service to inject into, so we just prepend our module into the InterceptionContext
yeah? If so, I think I should clarify what I meant; I think we can move GRPC::InterceptionContext::InterceptWithDatadog
to Datadog::Contrib::GRPC::InterceptWithDatadog
or something similar. The point of which is mostly to just avoid defining constants in the GRPC namespace, even if we do have to inject modules into constants defined there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different question here: what happens if a user defines something like:
# Activate default GRPC tracing
Datadog.configure do |c|
c.use :grpc
end
# Override configuration for one particular client
alternate_client = Demo::Echo::Service.rpc_stub_class.new(
'localhost:50051',
:this_channel_is_insecure,
:interceptors => [configured_interceptor]
)
Would this cause this alternate_client
to have two Datadog interceptors in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. The login in already_prepended?
should prevent that from happening. The intercept!
call happens during the RPC interaction, so it should only add the default datadog interceptor to the call chain if the trace has not begun and if no other datadog interceptor has been supplied.
end | ||
|
||
def choose_datadog_interceptor(args) | ||
if args.key?(:metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the reasoning here for how an interceptor is chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the gRPC Ruby core library, Client and Server interceptors have the same method names, but different keyword arguments. Clients always have a metadata
keyword whereas Servers do not; servers always have a call
keyword. A metadata
is very similar to a header object in many HTTP libraries; it may be just a Hash
in some cases. A call
is an internal object ( a GRPC::ActiveCall
) that provides access to the ongoing communication between client and server; it provides access to the metadata sent by the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, is there any more concrete/reliable way of differentiating between client and server calls? It'd be worthwhile if possible, otherwise we can just go with what will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not finding anything more concrete. I'll keep looking, but the internals of the grpc
gem aren't revealing anything more direct.
lib/ddtrace/contrib/grpc/patcher.rb
Outdated
|
||
def prepend_interceptor | ||
::GRPC::InterceptionContext | ||
.prepend(::GRPC::InterceptionContext::InterceptWithDatadog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What versions of Ruby are we supporting with GRPC? A limited set is okay, but I can see this won't work with 1.9.3, and might not work with 2.0 (I think you might have to do send(:prepend)
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to it's use of keyword arguments, only 2.x is supported.
I think if you're uncomfortable with modifying the existing object then we could simply require the more involved configuration. Otherwise, I'll look into and test some earlier 2.x
rubies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring the more involved configuration seems "cleaner" and simpler, however, I think for now, prepending into the GRPC module is okay, for the sake of having some out of the box, easy GRPC tracing.
Let's just verify its working with 2.0+ for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looks like the grpc
gem itself has a dependency on jwt
, which requires a Ruby >= 2.1
. Since 2.1
is past EOL, how about we just require >= 2.2
? A quick test on my local machine works out OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine with me; seems like a feature that should require sufficiently new Ruby anyways. We'll just have to be sure to add that to the documentation.
metadata[GRPC_METADATA_TRACE_ID] = context.trace_id.to_s | ||
metadata[GRPC_METADATA_PARENT_ID] = context.span_id.to_s | ||
metadata[GRPC_METADATA_SAMPLING_PRIORITY] = context.sampling_priority.to_s | ||
metadata.delete(GRPC_METADATA_SAMPLING_PRIORITY) unless context.sampling_priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but can we just do:
metadata[GRPC_METADATA_SAMPLING_PRIORITY] = context.sampling_priority.to_s if context.sampling_priority
end | ||
|
||
def sampling_priority | ||
value = @metadata.fetch(GRPC_METADATA_SAMPLING_PRIORITY, -1).to_i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Datadog::Ext::Priority::USER_REJECT
has a value of -1
which is legitimate. Don't we want to propagate that?
Something like?
value = @metadata.fetch(GRPC_METADATA_SAMPLING_PRIORITY, nil)
value.nil? ? nil : value.to_i
# Or you could do:
value && value.to_i
|
||
let(:configured_client_interceptor) do | ||
Datadog::Contrib::GRPC::DatadogInterceptor::Client.new do |c| | ||
c.service_name = 'cepsr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, like the name!
require 'ddtrace' | ||
|
||
RSpec.describe 'tracing on the client connection' do | ||
subject { Datadog::Contrib::GRPC::DatadogInterceptor::Client.new } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest giving subject
blocks names. Something like subject(:client)
would be good here.
@@ -0,0 +1,88 @@ | |||
require 'grpc' | |||
|
|||
module GRPCHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge deal, but food for thought: consider turning this into an RSpec shared context that you can compose into the other example files.
Then you can stub the constants to make them more composable. A reason for doing so is that it allows the RSpec example to optionally override and tweak behavior, without having to define constants per example.
e.g. a brief example of this could look like:
# Shared context
RSpec.shared_context 'GRPC service' do
def run_request_reply(address = '0.0.0.0:50052', client = nil)
runner(address, client) { |c| c.basic(message_class.new) }
end
# Stubbed constant
# This message class can be overidden in an example to change behavior
let(:message_class) do
stub_const('TestMessage', Class.new do
class << self
def marshal(_o)
''
end
def unmarshal(_o)
new
end
end
end)
end
end
# Compose it into a test
RSpec.describe "GRPC test" do
include_context 'GRPC service'
end
For a better example in practice, check out the Resque job
shared context here: https://github.com/DataDog/dd-trace-rb/pull/405/files
This is only a suggestion though; only implement it if you think it will benefit these tests.
I think we're really close to merging this one now. Final steps:
|
Thanks! |
I don't see why we can't add a rough draft for it now. Feel free to add something to |
docs/GettingStarted.md
Outdated
|
||
| Key | Description | Default | | ||
| --- | --- | --- | | ||
| ``service_name`` | Service name used for `grpc` instrumentation | grape | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use a default other than grape
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops 🤦♂️
docs/GettingStarted.md
Outdated
@@ -1102,6 +1154,30 @@ Then you must enable the request queuing feature in the integration handling the | |||
|
|||
For Rack based applications, see the [documentation](#rack) for details for enabling this feature. | |||
|
|||
### HTTP request queuing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this end up in this PR? I think this got merged to 0.13-dev
already. Bad rebase? Looking at this, this PR has a lot of commits it shouldn't have in it (which have already been merged to 0.13-dev
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting. Fixing....
…C::Interceptor registry.This leads to cleaner separation between client side and server sidetracing implementation and testing.
use service name from the pin
some rubocop cleanup
- change module name to `Datadog::Contrib::GRPC::InterceptWithDatadog` - allow sampling priority value less than zero ( for user reject ) - use named subjects in spec
3b01ffc
to
94b1212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm satisfied with this. Given the size of this, I'm going to request one other review from another maintainer on our side, but otherwise I think we should be good to merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive contribution! Thank you very much @Jared-Prime! It's a huge improvement to our supported libraries!
Datadog::GRPCPropagator | ||
.inject!(span.context, metadata) | ||
rescue StandardError => e | ||
Datadog::Tracer.log.debug("GRPC client trace failed: #{e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thank you for your help in completing this, @delner |
Hey @Jared-Prime , I merged this to Seems like these two examples are flaky, tend to break randomly:
|
@delner Do you think we need to set sampling priority on these two examples? |
@Jared-Prime Probably. Distributed tracing is supposed to always distribute the same |
I think this has been solved in #412. |
@Jared-Prime Just want to say, thanks for your amazing contribution! This is a really cool feature, and you did a great job with it, as well as being on top of things throughout the PR process. It's been a pleasure working with you on this from the maintainer side! Looking forward to seeing others enjoy this new feature as well! 🎉 🎉 🎉 |
This PR provides an implementation for distributed tracing for Ruby gRPC services and clients.
Screenshot from Integration Using Demo Application: https://github.com/Jared-Prime/grpc-demo
In gRPC nomenclature, an "interceptor" is similar to middleware. When running a service or a client, the
#patch
method prepends the datadog interceptor onto the default middleware call chain, ensuring traces span across the scope of the protocol implementation.The following types of gRPC calls have been implemented and tested:
A demo application is available at https://github.com/Jared-Prime/grpc-demo . A version of this patch is undergoing internal testing at my workplace ( Kenna Security ) as well.
This Pull Request closes #379 if accepted and merged; it closes #378 ( a previous attempt at this feature ) as well.
Review, suggestions, and refinements are welcome :)