-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for federated tracing #16
Conversation
Out of curiosity do you have thoughts on how non-federated use cases that send traces directly to the Apollo ingress endpoint should be supported e.g. what should live in this gem vs. apollo-tracing-ruby vs. graphql-ruby? We (Salsify) don't have any immediate plans to work on our fork of apollo-tracing-ruby because we decided not to move forward with Apollo due to some missing features. |
@jturkel These are great questions and I honestly don't have strong opinions. My use cases are completely tied to managed schema federation with Apollo, but if someone makes a case for organizing this code differently, I'm all ears.
That being said, thanks for the work done so far in apollo-tracing-ruby! It was hugely helpful in getting this to work. |
Makes sense. I agree that the small amount of potentially shared code (i.e. the |
Excited to try this, thanks for the hard work @lennyburdette !!! 🎉 |
Sorry for the delay. I've been on vacation. I'll take a look at it this week |
lib/apollo-federation/tracing.rb
Outdated
root: trace[:node_map].root, | ||
) | ||
|
||
json = result.to_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be after you do your additional result
changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, yes! not sure how this is working today
lib/apollo-federation/tracing.rb
Outdated
module Tracing | ||
KEY = :ftv1 | ||
|
||
def self.use(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind wrapping these in a class << self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a preference between class << self
vs extend self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol never mind ... rubocop enforces module_function
3c8d73e
to
1a787c8
Compare
So.....can we get this merged? |
Sorry for the delay on this folks, I'll take a pass at it today |
execute_field_lazy(data, &block) | ||
else | ||
yield | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the implementation of the tracer. I couldn't find any documentation of these four events (execute_query
, execute_query_lazy
, execute_field
, execute_field_lazy
) beyond just their names here:
https://graphql-ruby.org/api-doc/1.9.11/GraphQL/Tracing
So I don't understand what the difference is between the lazy and non-lazy ones and whether the tracer is tracking each one correctly. Can you shed more light on how you figured this out?
The "Step 1" "Step 2"... comments are also confusing - is there some documentation that guarantees the events always occur in that order?
Finally - why is the execute_query
event used to record only a start_time
timestamp and not an end_time
timestamp after the block.call
line? And similarly why is execute_query_lazy
only recording the end_time
? It seems like we're assuming that both of these events always occur in sequence. Is there documentation for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noaelad TBH, I probably wouldn't have figured this out on my own. I used this in-progress PR as reference: https://github.com/salsify/apollo-tracing-ruby/blob/feature/new-apollo-api/lib/apollo_tracing/tracer.rb
I believe the steps are guaranteed to execute in the order described in my comments. The naming is definitely confusing—might have to poke rmosolgo for an explaination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is around relying on undocumented behavior in graphql-ruby
, especially where that undocumented behavior doesn't make a lot of immediate sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried setting both start_time
and end_time
in both execute_query
and execute_query_lazy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled your branch and tried playing around, when both methods (execute_query
and execute_query_lazy
) have the same implementation the only spec failures result from the fact that timestamps are increased twice and not just once (which makes sense in today's world). I'd prefer an implementation like this that is more robust and relies less on the fact that graphql-ruby
happens to call both execute_query
and execute_query_lazy
in a specific sequence. WDYT?
By the way the specs would still be a brittle to changes in graphql-ruby
but I don't have a good suggestion for mitigating that.
def self.execute_query(data, &block)
query = data.fetch(:query)
return block.call unless query.context && query.context[:tracing_enabled]
trace = query.context.namespace(ApolloFederation::Tracing::KEY)
trace.merge!(
start_time: Time.now.utc,
start_time_nanos: Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond),
node_map: NodeMap.new,
)
result = block.call
trace.merge!(
end_time: Time.now.utc,
end_time_nanos: Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond),
)
result
end
def self.execute_query_lazy(data, &block)
execute_query(data, &block)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was a fun rabbit hole, but I think I understand tracing mechanics well now. I added a fat comment that explains the ordering of trace events. The key was figuring out that there are two execution phases: 1) non-lazy, 2) lazy. I could not figure out a way for execute_query_lazy
to not fire, so I think it's still a good place to record ending times.
Here's a graphql-ruby test that shows execution order. It always has eql
as the last step in the event list.
We could record an end time at the end of execute_query
but it will always get thrown away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7dc4a21
to
e0152e8
Compare
@lennyburdette this makes sense to me, thanks for digging into tracing. |
Federated GraphQL services should include timing and error information as a Base64-encoded protocol buffer message in the `"extensions.ftv1"` field. The gateway requests traces by adding a special header to the GraphQL request, and combines traces from all federated services into a single trace. This change includes a Tracer that uses the graphql-ruby [tracing API][t] to record field timings and info and store it on the execution context. It also includes methods on the `ApolloFederation::Tracing` module to pluck the info from the context, convert it to an encoded string, and attach it to the query result's extensions. I used the Apollo Server typescript code as reference: * https://github.com/apollographql/apollo-server/blob/master/packages/apollo-engine-reporting/src/federatedExtension.ts * https://github.com/apollographql/apollo-server/blob/master/packages/apollo-engine-reporting/src/treeBuilder.ts As well as an unfinished fork of apollo-tracing-ruby: * https://github.com/salsify/apollo-tracing-ruby/blob/feature/new-apollo-api/lib/apollo_tracing/tracer.rb * https://github.com/salsify/apollo-tracing-ruby/blob/feature/new-apollo-api/lib/apollo_tracing/trace_tree.rb Federated tracing documentation: https://www.apollographql.com/docs/apollo-server/federation/metrics/ Addresses Gusto#14 [t]:https://graphql-ruby.org/queries/tracing.html
* `module_function` vs repetitive `self.` * fix a call to `result.to_h` in `.attach_trace_to_result`
e59edf6
to
634edc2
Compare
…est to cover lazy fields will squash later
634edc2
to
dd70bf4
Compare
@noaelad got the tests passing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this very useful feature!
🎉 This PR is included in version 0.4.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Thank you for the review!! |
Federated GraphQL services should include timing and error
information as a Base64-encoded protocol buffer message in
the
"extensions.ftv1"
field. The gateway requests tracesby adding a special header to the GraphQL request, and combines
traces from all federated services into a single trace.
This change includes a Tracer that uses the graphql-ruby
tracing API to record field timings and info and store it
on the execution context. It also includes methods on the
ApolloFederation::Tracing
module to pluck the info from thecontext, convert it to an encoded string, and attach it to the
query result's extensions.
I used the Apollo Server typescript code as reference:
As well as an unfinished fork of apollo-tracing-ruby:
Federated tracing documentation: https://www.apollographql.com/docs/apollo-server/federation/metrics/
Addresses #14
cc @rylanc @zionts @glasser