-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix panic recovery for error handling #38570
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this PR is misguided. When i wrote the code I intended to catch runtime.Error panics and letting them flow through. The reason is that runtime.Error panics are recoverable, and there is no reason to let a cluster go down when they occur.
FYI I even went through the go source code to validate the following:
- runtime.Error is only emitted for "soft" errors like out-of-bound accesses, assertion failures, etc
- for "serious" internal errors e.g. in the scheduler, bad goroutine state, allocator problem etc, the runtime throws a string which does not implement
error
and thus will not be captured here.
So, can you explain a little better why you thought this PR was a good idea?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @knz, and @rytaft)
Today if you are working on a change that results in a nil dereference or out-of-bound access, you get a one line error with no stack trace. Good luck debugging that. IMO that is not acceptable, both for development workflow and customer support (what will we do when we get a report from a customer which just says "out of bounds" with no other context?) When we agreed to catch assertion errors thrown by the optimizer, it was with the condition that we will still always get stack traces for them. The discussion was mostly focused on assertions generated by our code, I don't think we specifically discussed catching runtime errors (at least not to my knowledge). I am ok catching them but only if we don't lose the stack trace. |
Oh I see. If you do errors.WithStack(err) when returning the recovered panic, you'll get the panic stack trace captured with the error.
RaduBerinde <[email protected]> schreef op 29 juni 2019 15:52:07 CEST:
…Today if you are working on a change that results in a nil dereference
or out-of-bound access, you get a one line error with no stack trace.
Good luck debugging that. IMO that is not acceptable, both for
development workflow and customer support (what will we do when we get
a report from a customer which just says "out of bounds" with no other
context?)
When we agreed to catch assertion errors thrown by the optimizer, it
was with the condition that we will still always get stack traces for
them. The discussion was mostly focused on assertions generated by our
code, I don't think we specifically discussed catching runtime errors
(at least not to my knowledge). I am ok catching them but only if we
don't lose the stack trace.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#38570 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
It doesn't work. The stack trace isn't shown in important cases: In
In an opt test:
|
I put the patch which I ran above in https://github.com/RaduBerinde/cockroach/tree/opt-err-fix-2 |
Maybe I should try |
oh yes, absolutely. I hadn't thought of that but indeed it's the best way to ensure we get telemetry, etc. |
Just leaving a note with the status of this PR - converting to |
38710: errors: fix the formatting with %+v r=knz a=knz (found by @RaduBerinde; needed to complete #38570) The new library `github.com/cockroachdb/errors` was not implementing `%+v` formatting properly for assertion and unimplemented errors. The wrong implementation was hiding the details of the cause of these two error types from the formatting logic. Fixing this bug comprehensively required completing the investigation of the Go 2 / `xerrors` error proposal. This revealed that the implementation of `fmt.Formatter` for wrapper errors (a `Format()` method) is required in all cases, at least until Go's stdlib learns about `errors.Formatter`. More details at golang/go#29934 and this commit message: cockroachdb/errors@78b6caa. This patch bumps the dependency `github.com/cockroachdb/errors` to pick up the fixes to assertion failures and unimplemented errors. The new definition of `errors.FormatError()` subsequently required re-implemening `Format)` for `pgerros.withCandidateCode`, which is also done here. Finally, this patch also picks up `errors.As()` and the new streamlined `fmt.Formatter` / `errors.Formatter` interaction, so this patch also simplifies a few custom error types in CockroachDB accordingly. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
Updated, using |
54b69fb
to
525b9ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 19 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)
pkg/util/errorutil/catch.go, line 29 at r1 (raw file):
// Convert runtime errors to internal errors, which display the stack and // get reported to Sentry. err = errors.NewAssertionErrorWithWrappedErrf(err, "")
That's what's creating the surprising result.
Until I fix this you can make the surprising errors with safe detail
disappear (and also introduce a clarification about where the runtime error comes from) as follows:
err = errors.HandledWithMessage(err, "Go runtime error")
err = errors.WithAssertionFailure(err)
err = errors.WithStack(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)
pkg/util/errorutil/catch.go, line 29 at r1 (raw file):
Previously, knz (kena) wrote…
That's what's creating the surprising result.
Until I fix this you can make the surprisingerrors with safe detail
disappear (and also introduce a clarification about where the runtime error comes from) as follows:err = errors.HandledWithMessage(err, "Go runtime error") err = errors.WithAssertionFailure(err) err = errors.WithStack(err) `` </blockquote></details> see https://github.com/cockroachdb/errors/pull/3 <!-- Sent from Reviewable.io -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)
pkg/util/errorutil/catch.go, line 29 at r1 (raw file):
Previously, knz (kena) wrote…
Then you can use err = errors.HandleAsAssertionFailure(err)
instead of the 3 lines I listed above.
The major entry points in the optimizer catch all panics that throw an error and converts them to errors. Unfortunately, this also catches runtime errors (in which case we convert them to errors and lose the stack trace). This change adds a `ShouldCatch` helper which determines if we should return a thrown object as an error. If the object is a `runtime.Error`, it gets wrapped by an AssertionFailed error which will cause correct error handling (stack trace, sentry reporting, etc). As part of this change, we are also removing wrappers like `builderError`, which are no longer useful. We fix the opt tester to fail with the full error information (using `%+v`) for assertion errors. Release note: None
Bumped the dep and switched to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)
TFTR! bors r+ |
38570: opt: fix panic recovery for error handling r=RaduBerinde a=RaduBerinde The major entry points in the optimizer catch all panics that throw an error and converts them to errors. Unfortunately, this also catches runtime errors (in which case we convert them to errors and lose the stack trace). This change adds a `ShouldCatch` helper which determines if we should return a thrown object as an error. If the object is a `runtime.Error`, it gets wrapped by an AssertionFailed error which will cause correct error handling (stack trace, sentry reporting, etc). As part of this change, we are also removing wrappers like `builderError`, which are no longer useful. We fix the opt tester to fail with the full error information (using `%+v`) for assertion errors. Release note: None 38660: opt: push limit into offset r=ridwanmsharif a=ridwanmsharif This change pushes the limit into an offset whenever possible. This shouldn't worsen any plan but does allow the `GetLimitedScans` rule to fire in more scenarios. Fixes #30416. ~~This is currently blocked on #38659.~~ Release note: None 38743: roachtest: skip jepsen/multi-register r=god a=nvanbenschoten There's no use running this every night until #36431 is fixed. Release note: None 38746: roachtest: don't reuse clusters after test failure r=andreimatei a=andreimatei We've had a case where a cluster got messed up somehow and then a bunch of tests that tried to reuse it failed. This patch employes a big hammer and makes it so that we don't reuse a cluster after test failure (which failure can be cluster related or not). Release note: None 38766: scripts/release-notes.py: help the user with --from/--until r=lhirata a=knz Requested by @lhirata Release note: None Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Ridwan Sharif <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Build succeeded |
The major entry points in the optimizer catch all panics that throw an
error and converts them to errors. Unfortunately, this also catches
runtime errors (in which case we convert them to errors and lose the
stack trace).
This change adds a
ShouldCatch
helper which determines if we shouldreturn a thrown object as an error. If the object is a
runtime.Error
, it gets wrapped by an AssertionFailed error whichwill cause correct error handling (stack trace, sentry reporting, etc).
As part of this change, we are also removing wrappers like
builderError
, which are no longer useful. We fix the opt tester tofail with the full error information (using
%+v
) for assertionerrors.
Release note: None