-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix graphql.resolve span durations #4387
Conversation
Overall package sizeSelf size: 6.62 MB Dependency sizes
🤖 This report was automatically generated by heaviest-objects-in-the-universe |
a56594d
to
fa362db
Compare
BenchmarksBenchmark execution time: 2024-06-07 14:21:37 Comparing candidate commit ff62ffd in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 257 metrics, 8 unstable metrics. scenario:plugin-graphql-with-depth-off-18
|
…on for a synchronous resolver as well
029aa81
to
ff62ffd
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4387 +/- ##
===========================================
+ Coverage 69.19% 80.42% +11.23%
===========================================
Files 1 3 +2
Lines 198 373 +175
Branches 33 33
===========================================
+ Hits 137 300 +163
- Misses 61 73 +12 ☔ View full report in Codecov by Sentry. |
* Fix graphql.resolve span duration --------- Co-authored-by: Scott Bezek <[email protected]>
* Fix graphql.resolve span duration --------- Co-authored-by: Scott Bezek <[email protected]>
* Fix graphql.resolve span duration --------- Co-authored-by: Scott Bezek <[email protected]>
* Fix graphql.resolve span duration --------- Co-authored-by: Scott Bezek <[email protected]>
Thanks @crysmags for getting the fix from #3925 merged (I was on paternity leave, so apologies for suddenly going AWOL on that PR)! I'm a little confused by the changes to the tests though:
I'm still not sure why the mocked time on my PR worked locally but not in CI, but for the sake of test coverage on |
What does this PR do?
This pull request closes #3925
Motivation
From PR #3925:
What does this PR do?
Fix a bug in the graphql plugin that causes all graphql.resolve spans to have incorrect durations
Motivation
When looking at APM traces for graphql requests, we noticed that the graphql.resolve spans all seem to extend until the end of the parent graphql.execute span, rather than when that field resolver actually completes. This makes it hard to identify which specific field resolvers are performance bottlenecks of a graphql query.
Example 1 - note that all graphql.resolve spans extend to roughly the same end time, even for simple fields that should be trivially fast to resolve:
Example 2 - on another, much more complex graphql query, it's even more apparent that something is wrong and causing all spans to extend until the entire graphql.execute span finishes; I would expect most of the purple bars to end earlier, and only one or a few long-tail field resolver spans would extend till the end.
Screenshot 2024-01-03 at 11 58 34 PM
Additional Notes
I believe this was a regression introduced in PR #3177, which refactored TracingPlugin subclasses to move the common span.finish() calls into the TracingPlugin base class.
However, the change to the graphql plugin's resolve operation in that PR stopped passing the finishTime argument from the finish method through to span.finish(); the base TracingPlugin.finish method does not use/accept any arguments and calls span.finish() with no args. Due to this change, span.finish() is being called without a finish time argument, and thus falls back to using the current time as the finish time.
Why does this result in all resolve spans finishing at the end of the execute span? If I'm understanding the code correctly, it looks like the graphql plugin defers finishing the resolve operations until the entire execute operation has completed - here. Although this code is still publishing the field's recorded finishTime to the finish channel, it is no longer being passed through to the span (as discussed above), so all the resolve spans are marked as finished at the current time, after the overall execute has completed.
Testing
The existing graphql plugin tests didn't appear to explicitly validate the span durations (other than a basic greater-than-zero check), so I made some guesses at how best to validate span durations within the existing test framework in order to add a unit test for this fix.
The new test introduces some new graphql field resolvers that intentionally take a while (via fake timers) to resolve:
2 async fields use setTimeout with different values to validate the duration recorded of concurrent graphql resolver promises
a 3rd new field directly ticks the clock as a synchronous resolver function to validate the duration recorded for a synchronous resolver
New test - fails as expected when run without the fix in resolve.js -- the faster async resolver's span is incorrectly recorded as taking 1234 milliseconds instead of the expected 100:
Screenshot 2024-01-05 at 10 38 37 PM
New test - passes with the changes to resolver.js:
Screenshot 2024-01-05 at 10 38 53 PM
Plugin Checklist
Additional Notes