Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove catch_unwind wrapper around the rust query planner #6397

Merged
merged 7 commits into from
Dec 6, 2024

Conversation

lrlna
Copy link
Member

@lrlna lrlna commented Dec 4, 2024

This PR removes catch_unwind around the invocation of the rust query planner. As part of this change, I have changed all applicable panic!()s in apollo-federation, as well as remove adding a backtrace capture to SingleFederationErrors.

Details

panic()! at the apollo-federation

There were several panic!() marcos still in use in apollo-federation. Instead of panicking, these functions now return the expected FederationError.

If you're searching in the code, you'll still see a few panics. They are in tests, in macros or function annotated with #[cfg(test)], or in [unused] composition part of the codebase. More specifically:

Removing backtrace capture

We were adding backtrace captures to SingleFederationErrors in order to help with debugging unexpected errors while were still developing the rust query planner. Capturing backtraces is quite often a slow and memory intensive process, so we want to remove this additional functionality for rust query planner's GA. Since we now have a lot of confidence in this implementation and have not faced a panic across millions of operations that were tested and compared, this is a safe thing to remove.

Note: The compliance CI is failing at the moment, and will be fixed after #6395 is merged


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation Changeset 2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

lrlna added 2 commits December 4, 2024 15:16
There were several panic!() marcos still in use in `apollo-federation`. Instead of panicking, these functions now return the expected FederationError.

If you're searching in the code, you'll still see a few panics. They are in tests, in macros or function annotated with `#[cfg(test)]`, or in [unused] composition part of the codebase. More specifically:

* in internally used [CLI](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/cli/src/main.rs#L214)
* in tests:
  * [operation tests](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/src/operation/tests/mod.rs)
  * [build query graph tests](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/src/query_graph/build_query_graph.rs#L2448)
  * [api schema tests](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/tests/api_schema.rs)
  * [build query plan support tests](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/tests/query_plan/build_query_plan_support.rs)
  * [context build query plan test](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/tests/query_plan/build_query_plan_tests/context.rs)
  * [fetch operation name tests](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/tests/query_plan/build_query_plan_tests/fetch_operation_names.rs)
* test helpers annotated with `#[cfg(test)]`:
  * [connectors' json selection test helper](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/src/sources/connect/json_selection/apply_to.rs#L116)
  * [connectors selection! macro](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/src/sources/connect/json_selection/helpers.rs#L10)
* [argument merge strategies](https://github.com/apollographql/router/blob/85f99f19e2a2907bd5597687773d68d63644a0c6/apollo-federation/src/schema/argument_composition_strategies.rs) used solely in composition
We were adding backtrace captures to SingleFederationErrors in order to help with debugging unexpected errors while were still developing the rust query planner. Capturing backtraces is quite often [a slow and memory intensive process](https://doc.rust-lang.org/stable/std/backtrace/struct.Backtrace.html#method.capture), so we want to remove this additional functionality for rust query planner's GA. Since we now have a lot of confidence in this implementation and have not faced a panic across millions of operations that were tested and compared, this is a safe thing to remove.
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Dec 4, 2024

✅ Docs Preview Ready

No new or changed pages found.

@router-perf
Copy link

router-perf bot commented Dec 4, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

Since we are now confident in rust query planner's implementation, we no longer need to catch_unwind when planning an operation in Rust.
@lrlna
Copy link
Member Author

lrlna commented Dec 4, 2024

@TylerBloom I'd like your review on this, since you're working on tidying up the federation errors and I want to make sure this aligns with that you're doing.

@SimonSapin I'd also like your review on this, since you initially set up the catch_unwind and backtrace capture.

@lrlna lrlna removed the request for review from benjamn December 4, 2024 14:23
Copy link
Contributor

@TylerBloom TylerBloom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federation side LGTM. Added a few suggestion to use bail! and internal_error! for consistency.

apollo-federation/src/query_plan/fetch_dependency_graph.rs Outdated Show resolved Hide resolved
apollo-federation/src/query_plan/query_planner.rs Outdated Show resolved Hide resolved
apollo-federation/src/query_plan/fetch_dependency_graph.rs Outdated Show resolved Hide resolved
apollo-federation/src/query_plan/fetch_dependency_graph.rs Outdated Show resolved Hide resolved
@TylerBloom
Copy link
Contributor

@TylerBloom I'd like your review on this, since you're working on tidying up the federation errors and I want to make sure this aligns with that you're doing.

Ya, this is inline with what I've been working on.

Copy link
Contributor

@SimonSapin SimonSapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with one minor change

Comment on lines -755 to -756
// Once we've panic'ed the behaviour of the router is non-deterministic
// We've logged out the panic details. Terminate with an error code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if condition should be indeed removed but the comment is still relevant and should be kept

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops, thank you for catching!

@lrlna lrlna merged commit d169b2f into dev Dec 6, 2024
13 checks passed
@lrlna lrlna deleted the lrlna/ROUTER-909 branch December 6, 2024 14:46
@BrynCooke BrynCooke mentioned this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants