Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove catch_unwind wrapper around the rust query planner #6397

Merged
merged 7 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .changesets/maint_lrlna_remove_catch_unwind.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

### Remove catch_unwind wrapper around the native query planner ([PR #6397](https://github.com/apollographql/router/pull/6397))

As part of internal maintenance of the query planner, we are removing the
catch_unwind wrapper around the native query planner. This wrapper was used as
an extra safeguard for potential panics the native planner could produce. The
native query planner no longer has any code paths that could panic. We have also
not witnessed a panic in the last four months, having processed 560 million real
user operations through the native planner.

This maintenance work also removes backtrace capture for federation errors which
was used for debugging and is no longer necessary as we have the confidence in
the native planner's implementation.

By [@lrlna](https://github.com/lrlna) in https://github.com/apollographql/router/pull/6397
28 changes: 5 additions & 23 deletions apollo-federation/src/error/mod.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
use std::backtrace::Backtrace;
use std::cmp::Ordering;
use std::fmt::Display;
use std::fmt::Formatter;
Expand Down Expand Up @@ -523,8 +522,8 @@ pub struct MultipleFederationErrors {
impl MultipleFederationErrors {
pub fn push(&mut self, error: FederationError) {
match error {
FederationError::SingleFederationError { inner, .. } => {
self.errors.push(inner);
FederationError::SingleFederationError(error) => {
self.errors.push(error);
}
FederationError::MultipleFederationErrors(errors) => {
self.errors.extend(errors.errors);
Expand Down Expand Up @@ -591,22 +590,14 @@ impl Display for AggregateFederationError {
}
}

/// Work around thiserror, which when an error field has a type named `Backtrace`
/// "helpfully" implements `Error::provides` even though that API is not stable yet:
/// <https://github.com/rust-lang/rust/issues/99301>
type ThiserrorTrustMeThisIsTotallyNotABacktrace = Backtrace;

// PORT_NOTE: Often times, JS functions would either throw/return a GraphQLError, return a vector
// of GraphQLErrors, or take a vector of GraphQLErrors and group them together under an
// AggregateGraphQLError which itself would have a specific error message and code, and throw that.
// We represent all these cases with an enum, and delegate to the members.
#[derive(thiserror::Error)]
pub enum FederationError {
#[error("{inner}")]
SingleFederationError {
inner: SingleFederationError,
trace: ThiserrorTrustMeThisIsTotallyNotABacktrace,
},
#[error(transparent)]
SingleFederationError(#[from] SingleFederationError),
#[error(transparent)]
MultipleFederationErrors(#[from] MultipleFederationErrors),
#[error(transparent)]
Expand All @@ -616,22 +607,13 @@ pub enum FederationError {
impl std::fmt::Debug for FederationError {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
match self {
Self::SingleFederationError { inner, trace } => write!(f, "{inner}\n{trace}"),
Self::SingleFederationError(inner) => std::fmt::Debug::fmt(inner, f),
Self::MultipleFederationErrors(inner) => std::fmt::Debug::fmt(inner, f),
Self::AggregateFederationError(inner) => std::fmt::Debug::fmt(inner, f),
}
}
}

impl From<SingleFederationError> for FederationError {
fn from(inner: SingleFederationError) -> Self {
Self::SingleFederationError {
inner,
trace: Backtrace::capture(),
}
}
}

impl From<DiagnosticList> for FederationError {
fn from(value: DiagnosticList) -> Self {
SingleFederationError::from(value).into()
Expand Down
7 changes: 3 additions & 4 deletions apollo-federation/src/operation/rebase.rs
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,9 @@ impl FederationError {
fn is_rebase_error(&self) -> bool {
matches!(
self,
crate::error::FederationError::SingleFederationError {
inner: crate::error::SingleFederationError::InternalRebaseError(_),
..
}
crate::error::FederationError::SingleFederationError(
crate::error::SingleFederationError::InternalRebaseError(_)
)
)
}
}
Expand Down
16 changes: 9 additions & 7 deletions apollo-federation/src/query_plan/fetch_dependency_graph.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ use crate::bail;
use crate::display_helpers::DisplayOption;
use crate::error::FederationError;
use crate::error::SingleFederationError;
use crate::internal_error;
use crate::link::graphql_definition::DeferDirectiveArguments;
use crate::operation::ArgumentList;
use crate::operation::ContainmentOptions;
Expand Down Expand Up @@ -1718,12 +1719,12 @@ impl FetchDependencyGraph {
children.push(child_index);
} else {
let Some(child_defer_ref) = &child.defer_ref else {
panic!(
return Err(internal_error!(
"{} has defer_ref `{}`, so its child {} cannot have a top-level defer_ref.",
node.display(node_index),
DisplayOption(node.defer_ref.as_ref()),
child.display(child_index),
);
));
lrlna marked this conversation as resolved.
Show resolved Hide resolved
};

if !node.selection_set.selection_set.selections.is_empty() {
Expand Down Expand Up @@ -2676,10 +2677,9 @@ impl FetchDependencyGraphNode {
let operation =
operation_compression.compress(&self.subgraph_name, subgraph_schema, operation)?;
let operation_document = operation.try_into().map_err(|err| match err {
FederationError::SingleFederationError {
inner: SingleFederationError::InvalidGraphQL { diagnostics },
..
} => FederationError::internal(format!(
FederationError::SingleFederationError(SingleFederationError::InvalidGraphQL {
diagnostics,
}) => FederationError::internal(format!(
lrlna marked this conversation as resolved.
Show resolved Hide resolved
"Query planning produced an invalid subgraph operation.\n{diagnostics}"
)),
_ => err,
Expand Down Expand Up @@ -3402,7 +3402,9 @@ impl DeferTracking {

if let Some(parent_ref) = &defer_context.current_defer_ref {
let Some(parent_info) = self.deferred.get_mut(parent_ref) else {
panic!("Cannot find info for parent {parent_ref} or {label}");
return Err(internal_error!(
"Cannot find info for parent {parent_ref} or {label}"
));
lrlna marked this conversation as resolved.
Show resolved Hide resolved
};

parent_info.deferred.insert(label.clone());
Expand Down
5 changes: 3 additions & 2 deletions apollo-federation/src/query_plan/query_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ use super::fetch_dependency_graph::FetchIdGenerator;
use super::ConditionNode;
use crate::error::FederationError;
use crate::error::SingleFederationError;
use crate::internal_error;
use crate::operation::normalize_operation;
use crate::operation::NamedFragments;
use crate::operation::NormalizedDefer;
Expand Down Expand Up @@ -443,10 +444,10 @@ impl QueryPlanner {
.root_kinds_to_nodes()?
.get(&normalized_operation.root_kind)
else {
panic!(
return Err(internal_error!(
"Shouldn't have a {0} operation if the subgraphs don't have a {0} root",
normalized_operation.root_kind
);
));
lrlna marked this conversation as resolved.
Show resolved Hide resolved
};

let operation_compression = if self.config.generate_query_fragments {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ impl ApplyToError {
}))
}

// This macro is useful for tests, but it absolutely should never be used with
// dynamic input at runtime, since it panics for any input that's not JSON.
#[cfg(test)]
fn from_json(json: &JSON) -> Self {
if let JSON::Object(error) = json {
Expand Down
13 changes: 1 addition & 12 deletions apollo-router/src/executable.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
//! Main entry point for CLI command to start server.

use std::cell::Cell;
use std::env;
use std::fmt::Debug;
use std::net::SocketAddr;
Expand Down Expand Up @@ -751,20 +750,10 @@ fn setup_panic_handler() {
} else {
tracing::error!("{}", e)
}
if !USING_CATCH_UNWIND.get() {
// Once we've panic'ed the behaviour of the router is non-deterministic
// We've logged out the panic details. Terminate with an error code
Comment on lines -755 to -756
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if condition should be indeed removed but the comment is still relevant and should be kept

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops, thank you for catching!

std::process::exit(1);
}
std::process::exit(1);
}));
}

// TODO: once the Rust query planner does not use `todo!()` anymore,
// remove this and the use of `catch_unwind` to call it.
thread_local! {
pub(crate) static USING_CATCH_UNWIND: Cell<bool> = const { Cell::new(false) };
}

static COPIED: AtomicBool = AtomicBool::new(false);

fn copy_args_to_env() {
Expand Down
5 changes: 1 addition & 4 deletions apollo-router/src/query_planner/bridge_query_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -183,10 +183,7 @@ impl PlannerMode {
let result = QueryPlanner::new(schema.federation_supergraph(), config);

match &result {
Err(FederationError::SingleFederationError {
inner: error,
trace: _,
}) => match error {
Err(FederationError::SingleFederationError(error)) => match error {
SingleFederationError::UnsupportedFederationVersion { .. } => {
metric_rust_qp_init(Some(UNSUPPORTED_FED1));
}
Expand Down
58 changes: 17 additions & 41 deletions apollo-router/src/query_planner/dual_query_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ use apollo_compiler::validation::Valid;
use apollo_compiler::ExecutableDocument;
use apollo_compiler::Name;
use apollo_compiler::Node;
use apollo_federation::error::FederationError;
use apollo_federation::query_plan::query_planner::QueryPlanOptions;
use apollo_federation::query_plan::query_planner::QueryPlanner;
use apollo_federation::query_plan::QueryPlan;
Expand All @@ -24,7 +25,6 @@ use super::fetch::SubgraphOperation;
use super::subscription::SubscriptionNode;
use super::FlattenNode;
use crate::error::format_bridge_errors;
use crate::executable::USING_CATCH_UNWIND;
use crate::query_planner::bridge_query_planner::metric_query_planning_plan_duration;
use crate::query_planner::bridge_query_planner::JS_QP_MODE;
use crate::query_planner::bridge_query_planner::RUST_QP_MODE;
Expand Down Expand Up @@ -80,47 +80,23 @@ impl BothModeComparisonJob {
}

fn execute(self) {
// TODO: once the Rust query planner does not use `todo!()` anymore,
// remove `USING_CATCH_UNWIND` and this use of `catch_unwind`.
let rust_result = std::panic::catch_unwind(|| {
let name = self
.operation_name
.clone()
.map(Name::try_from)
.transpose()?;
USING_CATCH_UNWIND.set(true);

let start = Instant::now();

// No question mark operator or macro from here …
let result =
let start = Instant::now();

let rust_result = self
.operation_name
.as_deref()
.map(|n| Name::new(n).map_err(FederationError::from))
.transpose()
.and_then(|operation| {
self.rust_planner
.build_query_plan(&self.document, name, self.plan_options);

let elapsed = start.elapsed().as_secs_f64();
metric_query_planning_plan_duration(RUST_QP_MODE, elapsed);

metric_query_planning_plan_both_comparison_duration(RUST_QP_MODE, elapsed);
metric_query_planning_plan_both_comparison_duration(JS_QP_MODE, self.js_duration);

// … to here, so the thread can only eiher reach here or panic.
// We unset USING_CATCH_UNWIND in both cases.
USING_CATCH_UNWIND.set(false);
result
})
.unwrap_or_else(|panic| {
USING_CATCH_UNWIND.set(false);
Err(apollo_federation::error::FederationError::internal(
format!(
"query planner panicked: {}",
panic
.downcast_ref::<String>()
.map(|s| s.as_str())
.or_else(|| panic.downcast_ref::<&str>().copied())
.unwrap_or_default()
),
))
});
.build_query_plan(&self.document, operation, self.plan_options)
});

let elapsed = start.elapsed().as_secs_f64();
metric_query_planning_plan_duration(RUST_QP_MODE, elapsed);

metric_query_planning_plan_both_comparison_duration(RUST_QP_MODE, elapsed);
metric_query_planning_plan_both_comparison_duration(JS_QP_MODE, self.js_duration);

let name = self.operation_name.as_deref();
let operation_desc = if let Ok(operation) = self.document.operations.get(name) {
Expand Down
Loading