-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schemachanger: improve panic- and error handling #91411
Conversation
I'm keen for some suggestions on how to improve this. Here's what the log looks like now after crashing the node by doing
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
pkg/sql/schemachanger/scerrors/errors.go
line 34 at r1 (raw file):
// logging and wrapping errors, for convenience. func HandleErrorOrPanic( ctx context.Context, err *error, wrapMsgFmt string, wrapMsgArgs ...interface{},
I feel like we have an obnoxious linter that's not going to like this. Maybe just add a wrap function that's like func(error) error
to which you can pass the message in closure scope? Then just log the error as opposed to this?
I told the linter not to annoy us with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
pkg/sql/schemachanger/scerrors/errors.go
line 28 at r2 (raw file):
) // HandleErrorOrPanic generates a closure, intended to be deferred, to handle
I don't really get why this function unconditionally logs twice. I feel like it's doing too much. Can we have it either only log on the error path?
Fair enough, we don't need to log twice in most cases. I'll remove the "done" logs unless there's an error or expensive logging is enabled. The result will still be a bit more chatty than what we had before but that's a good thing IMHO. |
Thanks for the feedback. Things are now in a state where I'm happy with them. Sample:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
pkg/sql/schemachanger/scerrors/errors.go
line 34 at r1 (raw file):
Previously, ajwerner wrote…
I feel like we have an obnoxious linter that's not going to like this. Maybe just add a wrap function that's like
func(error) error
to which you can pass the message in closure scope? Then just log the error as opposed to this?
I don't know, it seems pretty crazy to have a thing called HandleErrorOnPanic
unconditionally log a message when called that has nothing to do with any possible error. I was absolutely fine with the message being logged in the case of an error, but I'm pretty opposed to the choice you've made without really making it clear in the function.
If you renamed the method to something clearer and then made the message a part of the description of the function, I'd be more amenable.
pkg/sql/schemachanger/scerrors/errors.go
line 60 at r3 (raw file):
msgFmt = "failed " + msgFmt + " with error: %v" args = append(args[:len(args):len(args)], unwrappedErr) log.InfofDepth(ctx, logDepth, msgFmt, args...)
should this be Warning
? Also, should we also check and see if the context is canceled?
Agreed,
I'm keen to avoid further back and forth here, please don't hesitate to be more explicit about what I should do.
I don't know the answer to either of these questions. Please tell me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's do Warning
if there was an error and the context was not canceled.
As for nameing, this is a little convoluted, but I think it reads nicely:
type EventLogger struct {
format string
args []interface
}
func StartEvent(format string, args ...interface) EventLogger { /* ... */ }
func (el EventLogger) HandlePanicAndLogError(err *error) { /*...*/ }
Then this thing would be called like:
defer scerrors.LogStart(
"building declarative schema changer plan in %s (rollback=%v) for %s",
redact.Safe(params.ExecutionPhase),
redact.Safe(params.InRollback),
redact.Safe(initial.StatementTags()),
).RecoverPanicAndLogError(&err)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
Thanks! Much appreciated. I like this a lot. Will do 👍 |
Previously, the declarative schema changer would only recover from runtime errors in certain subsystems like building the targets or planning the execution of operations. Consequently an implementation bug leading to a runtime error in the execution layer would trigger a panic which would not be recovered and which would cause the whole process to crash. This commit fixes this by introducing a common error handler in the form of scerrors.EventLogger, which recovers from panics, wraps errors, and prints informative log messages in a uniform way, to be used at the top of (or near the top of) the declarative schema changer call stack. Fixes cockroachdb#91400. Release note (bug fix): fixed a bug in which panics triggered by certain DDL statements were not properly recovered, leading to the cluster node crashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @postamar)
Thanks for the review! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 93e620e to blathers/backport-release-22.1-91411: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Previously, the declarative schema changer would only recover from runtime errors in certain subsystems like building the targets or planning the execution of operations. Consequently an implementation bug leading to a runtime error in the execution layer would trigger a panic which would not be recovered and which would cause the whole process to crash.
This commit fixes this by introducing a common error handler in the form of scerrors.HandleErrorOrPanic, which recovers from panics, wraps errors, and prints informative log messages in a uniform way, to be used at the top of (or near the top of) the declarative schema changer call stack.
Fixes #91400.
Release note (bug fix): fixed a bug in which panics triggered by certain DDL statements were not properly recovered, leading to the cluster node crashing.