Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backtrace option to errors #151

Merged
merged 5 commits into from
Apr 3, 2024
Merged

Conversation

nicklan
Copy link
Collaborator

@nicklan nicklan commented Mar 26, 2024

When debugging it can be difficult to know where an error occurred if it is created in numerous places. For example, today if there's an error in expression eval and you run dump-table, you might simply see:

Error(Arrow(InvalidArgumentError("Incorrect number of arrays for StructArray fields, expected 2 got 3"))

But where in the numerous places arrow errors occur did this happen? Tracking this down can be tricky.

It turns out thiserror (the crate we use to create our Error class) does have support for backtrace capture (see the fourth code-block here). However, it relies on a nightly only api (provide which relies on error_generic_member_access), so we can't use it.

This PR adds manual backtrace capture to our error type. We do this by:

  1. Adding a method to wrap an existing error in a Backtraced variant, which captures the error and a backtrace
  2. Manually implementing the From trait for errors where we want the backtrace.
  3. Right now this is only done for Arrow errors, which are all over the place in our client code. But in the future, when debugging an issue, it will be easy to move other error types to do this as needed.

This means that for errors we add this manual From, a backtrace will be captured at the point they are converted into a deltakernel::Error. This is usually via a ? somewhere in the kernel, so it is useful for debugging.

We also ensure that we only do this wrapping if backtracing is enabled (usually via RUST_BACKTRACE=1). Otherwise the wrapping method is a no-op, so this should have minimal perf impact.

Finally, we augment the dump-table command to pretty-print any errors it gets, which makes the backtrace more readable.

Now, we can get a nice error like this:

$ RUST_BACKTRACE=1 cargo run -- -- file:///home/nick/databricks/delta-kernel-rs/kernel/tests/data/basic_partitioned/
     Running `/home/nick/databricks/delta-kernel-rs/target/debug/dump-table -- 'file:///home/nick/databricks/delta-kernel-rs/kernel/tests/data/basic_partitioned/'`
Reading file:///home/nick/databricks/delta-kernel-rs/kernel/tests/data/basic_partitioned/
Backtraced {
    source: Arrow(
        InvalidArgumentError(
            "Incorrect number of arrays for StructArray fields, expected 2 got 3",
        ),
    ),
    backtrace: Backtrace [
        { fn: "deltakernel::error::Error::with_backtrace", file: "/home/nick/databricks/delta-kernel-rs/kernel/src/error.rs", line: 132 },
        { fn: "<deltakernel::error::Error as core::convert::From<arrow_schema::error::ArrowError>>::from", file: "/home/nick/databricks/delta-kernel-rs/kernel/src/error.rs", line: 146 },
        { fn: "<core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/result.rs", line: 1963 },
        { fn: "deltakernel::client::arrow_expression::evaluate_expression", file: "/home/nick/databricks/delta-kernel-rs/kernel/src/client/arrow_expression.rs", line: 170 },
        { fn: "<deltakernel::client::arrow_expression::DefaultExpressionEvaluator as deltakernel::ExpressionEvaluator>::evaluate", file: "/home/nick/databricks/delta-kernel-rs/kernel/src/client/arrow_expression.rs", line: 271 },
        { fn: "deltakernel::scan::Scan::execute", file: "/home/nick/databricks/delta-kernel-rs/kernel/src/scan/mod.rs", line: 249 },
        { fn: "dump_table::try_main", file: "./src/main.rs", line: 72 },
        { fn: "dump_table::main", file: "./src/main.rs", line: 41 },
        { fn: "core::ops::function::FnOnce::call_once", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs", line: 250 },
        { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs", line: 154 },
        { fn: "std::rt::lang_start::{{closure}}", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs", line: 167 },
        { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs", line: 284 },
        { fn: "std::panicking::try::do_call", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs", line: 552 },
        { fn: "std::panicking::try", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs", line: 516 },
        { fn: "std::panic::catch_unwind", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs", line: 142 },
        { fn: "std::rt::lang_start_internal::{{closure}}", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs", line: 148 },
        { fn: "std::panicking::try::do_call", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs", line: 552 },
        { fn: "std::panicking::try", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs", line: 516 },
        { fn: "std::panic::catch_unwind", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs", line: 142 },
        { fn: "std::rt::lang_start_internal", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs", line: 148 },
        { fn: "std::rt::lang_start", file: "/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs", line: 166 },
        { fn: "main" },
        { fn: "__libc_start_main" },
        { fn: "_start" },
    ],
}

We should be able to revert back to automatic backtracing via thiserror when error_generic_member_access stabilizes.

@@ -163,8 +163,11 @@ fn evaluate_expression(
.iter()
.zip(schema.fields())
.map(|(expr, field)| evaluate_expression(expr, batch, Some(field.data_type())));
let mut fields = output_schema.all_fields();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code obviously won't merge. was just a way to test via inducing an error, and I put it here in case anyone wants to play with this PR locally as part of reviewing.

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks awesome - i ran into the same issues when debugging myself! so should we have a macro do a bunch of the From<ErrorType> for Error so we get backtraces for all our error types?

@nicklan
Copy link
Collaborator Author

nicklan commented Mar 26, 2024

this looks awesome - i ran into the same issues when debugging myself! so should we have a macro do a bunch of the From<ErrorType> for Error so we get backtraces for all our error types?

So, I think we'll have to undo some of the work when thiserror just allows us to do this, so I was sort of leaning to just doing the ones we need. it also requires changes in the error enum, so i was going to limit it just to the ones that are raised in lots of places, and/or those we encounter as we go and want to add.

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - i like that we have a good plan for removing this once error_generic_member_access stabilizes. One nit: maybe we should open an issue to track that and just call it blocked until stabilization?

@nicklan
Copy link
Collaborator Author

nicklan commented Mar 29, 2024

LGTM - i like that we have a good plan for removing this once error_generic_member_access stabilizes. One nit: maybe we should open an issue to track that and just call it blocked until stabilization?

Good call: #157

}
}

macro_rules! from_with_backtrace(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this more code with the macro given you are only implementing for 2 error types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. the idea is that it's really easy in the future to add more if you need it to debug something

@nicklan nicklan merged commit 02acc9c into delta-io:main Apr 3, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants