-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: proposal for error handling #36987
Conversation
2c22ee9
to
191ec10
Compare
af05e2b
to
2bf40ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me. I like how you're not using a "god error" and the code over in https://github.com/cockroachdb/cockroach/pull/36023/files gives me a good high-level overview over the wire encoding.
What's missing to my understanding is how the version migration works. 19.1 won't understand anything about these new errors, which is probably fine as long as it didn't try to deconstruct them in the first place. But what about the roachpb.Error
boundary, for example? Are these just left alone for now or is a roachpb.NewError
replaced by some errors.X
call that at the same time results in something that the receiver (19.1 across the wire) can treat as a *roachpb.Error
and get the detail from (seems unlikely)?
You've alluded to having set things up in a way that doesn't require a "forklift". It'd be interesting to know how the implementation would work. Could you just land the errors package without any changes to auxiliary code, and then pick a contained area of the code in which to integrate them? What would that area be?
docs/RFCS/20190318_error_handling.md
Outdated
|
||
These situations are largely out of scope for this RFC, except for the | ||
following: **we should aim to audit the code and ensure that no | ||
barrier errors is never dropped** and instead let to flow where it can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/never/ever/
docs/RFCS/20190318_error_handling.md
Outdated
|
||
## Too much diversity | ||
|
||
There are currently 5 different error handling "protocols" inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would help to list them briefly.
docs/RFCS/20190318_error_handling.md
Outdated
|
||
1. `s1 := err.Error()` | ||
2. `s2 := err.Cause().Error()` | ||
3. `prefix_message := s1[:len(s2)-len(s1)]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, this may also break because who's to say that the error was created with errors.Wrap
? Anyone can implement causer
.
docs/RFCS/20190318_error_handling.md
Outdated
|
||
To add a code useful to PostgreSQL clients, one can use | ||
e.g. `errors.WithDefaultCode(err, pgerror.CodeSyntaxError)`. | ||
The new code is only used if the original error did not provide a code already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the right semantics? I suppose morally you want to not wrap an error with an error code twice. How about a boolean return or chaining with a marker error (so that you keep both codes, you can then decide if you want the inner our outer)?
docs/RFCS/20190318_error_handling.md
Outdated
|
||
Throughout the SQL package (and presumably over time throughout | ||
CockroachDB) errors can be annotated with "telemetry keys" to be | ||
increment when the error flows out of a server towards a client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incremented
docs/RFCS/20190318_error_handling.md
Outdated
The following APIs are provided (upward-compatible from existing `pgerror` client code): | ||
|
||
```go | ||
func UnimplementedWithIssue(issue int, format string, args ...interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing either a receiver or return type, possibly both.
docs/RFCS/20190318_error_handling.md
Outdated
- `errors.WithDetail(err, detail)` returns `&withDetail{cause: err, detail: detail}` | ||
- `errors.Wrap(err, msg)` returns `&withMessage{cause: &withStack{cause: err, stack: callers()}, message: msg}` | ||
|
||
We use multiple elementary type instead of a single "god type" with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types
docs/RFCS/20190318_error_handling.md
Outdated
|
||
```go | ||
// EncodeError converts a Go error to an encodable error. | ||
func EncodeError(err error) AnyErrorContainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborate on AnyErrorContainer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's missing to my understanding is how the version migration works.
Oh right, I forgot to write about that. Will add something tonight or tomorrow.
You've alluded to having set things up in a way that doesn't require a "forklift". It'd be interesting to know how the implementation would work. Could you just land the errors package without any changes to auxiliary code, and then pick a contained area of the code in which to integrate them? What would that area be?
There are two mechanisms:
-
replacing the
import
at the top fromerrors
orgithub.com/pkg/errors
togithub.com/cockroachdb/cockroach/pkg/errors
. This will "just work" already. Additional details (via pg code etc) can be added incrementally after that. -
as to actually using the new mechanism (Markers, etc). I plan do do this change throughout
sql
andsqlbase
to start (where they are most needed) then review the other SQL packages which I know best. As I learn more about core I plan to gradually "do things" there too.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @danhhz, and @tbg)
docs/RFCS/20190318_error_handling.md, line 83 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Would help to list them briefly.
Done.
docs/RFCS/20190318_error_handling.md, line 169 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Oof, this may also break because who's to say that the error was created with
errors.Wrap
? Anyone can implementcauser
.
Clarified in the example - this is only valid if s2
is suffix of s1
.
docs/RFCS/20190318_error_handling.md, line 421 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
s/never/ever/
Done.
docs/RFCS/20190318_error_handling.md, line 541 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Are these the right semantics? I suppose morally you want to not wrap an error with an error code twice. How about a boolean return or chaining with a marker error (so that you keep both codes, you can then decide if you want the inner our outer)?
Clarified that this is an educated choice, and clarified that we can do different later.
docs/RFCS/20190318_error_handling.md, line 633 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
incremented
Done.
docs/RFCS/20190318_error_handling.md, line 640 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This is missing either a receiver or return type, possibly both.
Done.
docs/RFCS/20190318_error_handling.md, line 1093 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
types
Done.
docs/RFCS/20190318_error_handling.md, line 1188 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Elaborate on
AnyErrorContainer
.
Done. - but what is needed to know?
e1a2974
to
c97a582
Compare
docs/RFCS/20190318_error_handling.md
Outdated
|
||
### Adding `context` | ||
|
||
When wrapping an error with `errors.WithCtx(err, ctx)`, the logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scares me - context lifetimes are already complicated enough - we really don't want them to escape into errors if we can avoid it.
docs/RFCS/20190318_error_handling.md
Outdated
"hello")`, although the form `Wrapx()` is preferred (see next section). | ||
|
||
To add a code useful to PostgreSQL clients, one can use | ||
e.g. `errors.WithDefaultCode(err, pgerror.CodeSyntaxError)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this - what makes it a "default" code?
docs/RFCS/20190318_error_handling.md
Outdated
the barrier error will mask the original code and replace it | ||
with `XX000/CodeInternalError`. | ||
|
||
- if the original pg error code was present with the `XX` prefix, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were not supposed to inspect pg error codes?
docs/RFCS/20190318_error_handling.md
Outdated
### Promotion of `XX` codes to barrier errors | ||
|
||
When a `XX` code is provided via `Wrapx`, `WithDefaultCode`, etc, the | ||
result becomes a barrier and the original error, if any, is masked as an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems "magic" and therefore best avoided. Don't you think it would be better to force explicit construction of barrier errors?
docs/RFCS/20190318_error_handling.md
Outdated
e.g. `errors.Wrapf(err1, "while handling %v", err2)`. | ||
|
||
The proposed library extends this behavior and makes it possible to | ||
preserve multiple causes using `WithOtherCause()`, for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it has to be potential to be overly complex. Is it really that important to preserve the other cause? What's wrong with the current behavior in practice?
|
||
## Problematic error use cases | ||
|
||
### Suspicious comparisons of the error object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing to io.EOF
isn't suspicious most of the time - that's how Go recommends that you check for error conditions from its library calls, which I'm sure is done in these code samples, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @danhhz, @jordanlewis, @knz, and @tbg)
docs/RFCS/20190318_error_handling.md, line 340 at r1 (raw file):
```go // Try an INSERT. if err := txn.Exec(txn1); err != nil {
s/txn([12])/stmt\1/g
docs/RFCS/20190318_error_handling.md, line 359 at r1 (raw file):
if origErr := txn.Exec(txn1); origErr != nil { if sqlbase.IsDuplicateInsertError(origErr) { if newErr := txn.Exec(txn2); err != nil {
s/err/newErr/
docs/RFCS/20190318_error_handling.md, line 382 at r1 (raw file):
information loss in the current source code. **In general, patterns of code like if-error-do-something-else need
I'd argue that in this particular case, the original code is not so bad, because IsDuplicateInsertError should have consumed all that is interesting about the original error. There might be cases where this is important, but a tree of error causes sounds unnecessarily complex (and without successful precedent to my knowledge).
Also note that this specific example doesn't work at the sql client level - once you've hit an error, you have to roll back the transaction. You have to use INSERT ON CONFLICT DO UPDATE.
docs/RFCS/20190318_error_handling.md, line 404 at r1 (raw file):
are all defective in the same way. **In general, if-error-do-something-else-that-is-not-an-error is code smell
I agree with this, and would extend it to the previous section too. So why was the previous section used to justify a complex new requirement of a tree of error causes while this aspect is declared out of scope?
docs/RFCS/20190318_error_handling.md, line 427 at r1 (raw file):
## Motivation for a new error type: summary The requirements on error objects have grown over time.
roachpb.Error was our original plan for structured error handling, and yet as requirements grew we added new error types (distsqlpb.Error and pgerror.Error) rather than extending roachpb.Error. I think it's worth discussing why this happened and what this means for the feasibility of one unifying error type.
Are there conflicting requirements at different levels? (I'm thinking about how we originally had a global "retryable" flag, but it turned out that different levels could retry different things). Are there backwards-compatibility requirements that make it difficult to change or replace roachpb.Error? Was everything forced through a bottleneck somewhere that turns things into strings or go 1 error
? Is it just tech debt (or Conway's law) that we created separate error types instead of unifying?
docs/RFCS/20190318_error_handling.md, line 442 at r1 (raw file):
We want to report important errors to telemetry (Sentry) for further inspection. However the report must be stripped of PII. We want error objects that preserve the "safe" part of details available when the error was produced or wrapped.
Could we meet this requirement by reporting errors to sentry at the point where the errors are generated instead of requiring that they be preserved through the whole stack? Or is the idea that the reportability of an error is depends on non-local conditions?
docs/RFCS/20190318_error_handling.md, line 444 at r1 (raw file):
of details available when the error was produced or wrapped. - **pg error codes.**
If this is a requirement, then the answer will have to be some bespoke cockroach-specific type. But is it really a requirement for the basic error type? As long as there is structure to the errors, we can map structured errors to pg errors at the top level (like we do now).
docs/RFCS/20190318_error_handling.md, line 498 at r1 (raw file):
# Guide-level explanation The package is `github.com/cockroachdb/cockroach/pkg/errors`.
Does errors.Error
implement error
? If so, we can run into the nil interface problem and it's harder to access the structured features (and easy for them to get stripped away); if not it's intrusive to change signatures everywhere. We chose not to implement error
for roachpb.Error
, and I still think that's probably the right decision, but the dichotomy between the two error types has proved very annoying.
Ergonomics nit: Unless the plan is to prohibit all use of github.com/pkg/errors as soon as this package is introduced, please give it a different name to avoid confusion.
docs/RFCS/20190318_error_handling.md, line 531 at r1 (raw file):
The library is compatible with existing protobuf error objects, so instantiating, for example, with `err :=
This is trivially valid because err
has inferred type *roachpb.RangeRetryError
. Did you mean to say that errors.Error
is an interface type and we the existing protobuf errors can implement that interface?
docs/RFCS/20190318_error_handling.md, line 539 at r1 (raw file):
"hello")`, although the form `Wrapx()` is preferred (see next section). To add a code useful to PostgreSQL clients, one can use
This level of specificity smells bad to me. It seems better to either have an extensible system (like context.Context
?) for attaching metadata to an error or to defer the mapping of errors to pgerror codes until higher in the stack. I don't think it's a good idea (or if it's even possible) for code in core to care what error codes we'll ultimately return in pgwire.
Concretely, the MSO team is currently building a distributed system that does not speak pgwire, but has many of the other requirements regarding structured, wire-encodable errors. This system might want to use HTTP error codes in place of pg errors. It would be a shame if we either had to fork the error package for each project or build in special support for each kind of error annotation.
docs/RFCS/20190318_error_handling.md, line 687 at r1 (raw file):
intermediate message prefixes added via `github.com/pkg/errors.Wrap()`. ### Barrier errors
Nit: My initial read of "barrier errors" was to think of the threading concept of barriers and to expect something related to ErrGroup. I don't have a good alternative name, though. OpaqueError? ShieldedError? Or AssertionError since that's what you used in the factory functions.
docs/RFCS/20190318_error_handling.md, line 733 at r1 (raw file):
"internal cause" invisible via `causer`/`Wrapper` interfaces. This makes it possible to e.g. wrap with `CodeCCLRequired` (`XXC01`)
Why would CodeCCLRequired ever be a wrapper instead of a leaf error?
docs/RFCS/20190318_error_handling.md, line 1169 at r1 (raw file):
// Compatibility with github.com/pkg/errors. func Errorf(format string, args ...interface{}) Error func WithStack(err error) error
Is the mix of Error
and error
here deliberate?
docs/RFCS/20190318_error_handling.md, line 1477 at r1 (raw file):
Suspicious comparisons of the error message
Looking through this list, most of these are inspecting errors that are generated outside our code from packages that don't provide any introspection interfaces. So those will still be around no matter what we do with our own errors. There are some things we should be doing better, but I think the volume of code included here paints an unnecessarily dark picture.
docs/RFCS/20190318_error_handling.md, line 737 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
This seems "magic" and therefore best avoided. Don't you think it would be better to force explicit construction of barrier errors?
Yeah, it seems to me that barrier errors shouldn't be so special. They're just errors without a cause, and a separately inspectable (optional!) piece of metadata that happens to be another error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrei RFA(small)L
Here are the latest changes:
- I moved/removed text away from the "guide-level guide" and integrated the details in the "reference-level guide" instead, which makes the text flow better and the doc easier to navigate
- I improved/extended the motivation for the 3 base modules (backbone, barriers, markers) in response to your comments (and also offline comments by Tobias)
What I haven't done yet: revisit/rewrite/simplify the two "subsumed errors" sections in the motivation section, based on your comments. That's why I haven't responded to this yet, but I will. I will do this later today.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @danhhz, @jordanlewis, @knz, @nvanbenschoten, @tbg, and @vivekmenezes)
docs/RFCS/20190318_error_handling.md, line 1266 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
shouldn't this be expressed as an interface that error type can implement, instead of this need to "register" error types?
Created a new section "Discussion: callback registration vs. interfaces" below to motivate further.
docs/RFCS/20190318_error_handling.md, line 1458 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
we discussed using just the types for markers here and hoping for the best
I thought about it further and discussed with Tobias and came to the conclusion that it's going to be messages after all. Added a discussion section below to weigh both sides.
docs/RFCS/20190318_error_handling.md, line 1792 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
As we were discussing, my name is used here but I'm not particularly sure I want to advocate for actually doing this.
Let's see what Tobi says too, but maybe move this away to annex.
Tobias suggests leaving it there and let more people chime in about whether they like it.
docs/RFCS/20190318_error_handling.md, line 1817 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'd call out here the fact that the
EnsureNotInDomain()
function does usually return barrier errors.
Done.
docs/RFCS/20190318_error_handling.md, line 1817 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
This requirement was about
ContextDeadlineExceeded
errors, right? I'd note that.
Done.
docs/RFCS/20190318_error_handling.md, line 2022 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
so
WithAssertionFailure
errors are supposed to wrapbarrierError
errors, right? Can't we flatten them - can't we makeerrutil.AssertionFailed()
return anAssertionError
that is not wrapping anything (or more exactly is not implementingcauser
)? I understand that generally you like to build small composable types, but here I don't think the composition gives you anything. And allowing assertions to wrap arbitrary errors seems confusing to me. It seems to me that it'd be better to be more prescriptive here.
Added a discussion section underneath to clarify.
docs/RFCS/20190318_error_handling.md, line 2276 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
wrappde
Done.
docs/RFCS/20190318_error_handling.md, line 2554 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
one thing that's not clear to me is how are we going to tell the "old client" cases, here and in the "Core RPCs" case below? Are we going to introduce a client version field in the requests?
I would think to produce the two error variants side-by-side (until the next release where we can delete the old path), like is done here:
cockroach/pkg/sql/distsqlpb/data.go
Lines 141 to 157 in 421483f
func NewError(err error) *Error { | |
resErr := &Error{} | |
// Encode the full error to the best of our ability. | |
// This field is ignored by pre-19.1 nodes. | |
fullError := sqlerror.EncodeError(err) | |
resErr.FullError = &fullError | |
// Now populate compatibility fields for pre-19.1 nodes. | |
if retryErr, ok := errors.Cause(err).(*roachpb.UnhandledRetryableError); ok { | |
resErr.Detail = &Error_RetryableTxnError{RetryableTxnError: retryErr} | |
} else { | |
pgErr, _, _ := pgerror.Flatten(err) | |
resErr.Detail = &Error_PGError{PGError: pgErr} | |
} | |
return resErr | |
} |
1de7553
to
9a21121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFA(generalized)L.
I think all comments so far have been addressed.
Fixed/removed the dubious text on subsumed errors towards he beginning.
Also added an "implementation strategy" section towards the end.
Added cross-references between guide and reference-level sections.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @danhhz, @jordanlewis, @knz, @nvanbenschoten, @tbg, and @vivekmenezes)
docs/RFCS/20190318_error_handling.md, line 382 at r1 (raw file):
Previously, knz (kena) wrote…
Also note that this specific example doesn't work at the sql client level - once you've hit an error, you have to roll back the transaction. You have to use INSERT ON CONFLICT DO UPDATE.
Alas, there is code that does exactly this. Consider from
schema_changer.go
:// Run through mutation state machine and backfill. err = sc.runStateMachineAndBackfill(ctx, &lease, evalCtx) // Purge the mutations if the application of the mutations failed due to // a permanent error. All other errors are transient errors that are // resolved by retrying the backfill. if isPermanentSchemaChangeError(err) { if err := sc.rollbackSchemaChange(ctx, err, &lease, evalCtx); err != nil { return err } } return errinside
rollbackSchemaChange
, theerr
given as argument evaporates. If anything "interesting" was present inerr
, nobody would ever know.Do you think I should use that example directly instead of using a synthetic example?
a tree of error causes sounds unnecessarily complex (and without successful precedent to my knowledge).
The emphasis is on capturing the precise causes and context for the purpose of troubleshooting and post-mortem analysis, not to have downstream code do anything useful with the tree of causes. In fact if you follow along the API definition, all the logic emphasizes the "direct" chain of causes and the "other" causes are only traversed to produce synthetic reports (the pg error object or the sentry packet).
(I have added a sentence to clarify the goal.)
Replaced the example.
docs/RFCS/20190318_error_handling.md, line 386 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: this example is not very good because neither the go's sql package nor our internal client package let one continue with a transaction after an error like this. At either level,
txn.Exec(stmt)
won't work.
You're right, I have replaced the example.
docs/RFCS/20190318_error_handling.md, line 439 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Did you mean to write "assertion error" here instead of "barrier error"? Cause otherwise I for one don't have a problem with letting barrier errors be "dropped".
My suggestion is to drop this "Errors subsumed by non-errors" section. I don't quite agree with it, and even if I did, it does not seem to contribute anything to the RFC besides the "we should aim" sentence, which which I also disagree (unless you change barrier->assertion).
The general problem is that "if-error-do-something-else-that-is-not-an-error" loses information.
It does lose information, but so what? Everything loses information - every time you update a variable you lose information about what the old state was. We don't particularly worry about this in other situations - so why would we worry about it when we handle errors?
I don't (or at least no longer) have a problem with your proposal to added fairly opaque "secondary causes" to errors as you suggest in other sections cause in that case there's an error flowing around and so it's clear what to attach this extra information to, but when there's no error, what even is your suggestion? You say that the protocol should be different and the original error shouldn't have been an error is someone can handle it, but in that case we would "lose information" just the same.In general, if-error-do-something-else-that-is-not-an-error is code smell
that should be avoided. It suggests that the original error object
was actually something else than error, for example it should have
been part of the function protocol.In my opinion, this is a very strong and prescriptive statement and in the general case I don't particularly agree. All errors that don't lead to a crash are handled at some level and thus are transformed into no error...
The philosophy of Go is pretty clear - errors are values like any other, not particularly exceptional. For example, I thinkio.EOF
is a fine thing to return from IO routines.
I agree that some things should not be represented as error, but it's case by case. For example, I agree thatCPut
s should not have returned errors when the condition fails. The reason why I agree is because for the KV layer in particular, errors mean that you can't continue with the transaction, and it would have been beneficial to be able to continue after a CPut error (and there's no particular reason why you couldn't).
On the other hand, I disagree that retriable errors should not have been errors. In those cases, you generally indeed cannot just continue with the transaction, and so errors seem appropriate to me. Also consider how much code would be polluted by having some other return value for signaling these retriable conditions.I'm curious to hear other opinions, though. @bdarnell for one seemed to agree with a previous phrasing of this section, and I wonder what he thinks.
I removed the section altogether and replaced it with a simpler section with a much narrower scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎆 looking fwd to all of this
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @danhhz, @jordanlewis, @knz, @nvanbenschoten, @tbg, and @vivekmenezes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read all of the proposal in detail but I did read the rationale and guide level explanation carefully. Looking forward to all of this happening.
Reviewed 1 of 2 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @danhhz, @jordanlewis, @knz, @nvanbenschoten, and @vivekmenezes)
docs/RFCS/20190318_error_handling.md, line 609 at r6 (raw file):
In the proposed library, the postgres details can be added with e.g. `errors.WithDetail(err, "some detail")`.
Is this (detail and hints) specific to PG errors or could we also adopt this elsewhere? I'm asking a) because that might be useful and b) if it's specific to PG WithDetail sounds too general. Perhaps WithPG{Detail,Hint}
. Or we just accept this special casing.
docs/RFCS/20190318_error_handling.md, line 702 at r6 (raw file):
// Was: // // if err == io.EOF { ...
Consider giving an anti-example. For example, if errors.Is(err, &SomeError{}) { ... }
would never match, right? That is, Is
is only for sentinel errors.
docs/RFCS/20190318_error_handling.md, line 780 at r6 (raw file):
printing
docs/RFCS/20190318_error_handling.md, line 850 at r6 (raw file):
// Try with an UPDATE instead. if newErr := txn.Exec(stmt2); err != nil { err := errors.Wrap(newErr, "while updating")
Mighi be worth mentioning which error gets to go first. In this example, it's the innermost error. Is that always the case? Also, what if there is more than one secondary error? Can we errors.WithSecondary
an error more than once?
docs/RFCS/20190318_error_handling.md, line 1004 at r6 (raw file):
> special support for each kind of error annotation. We can extend the system to adopt HTTP error codes as follows:
Is there a reason the PGWire-specific functionality wasn't implemented as an extension (or was it)?
docs/RFCS/20190318_error_handling.md, line 1028 at r6 (raw file):
```go func GetHTTPCode(err error) int { for ; err != nil; err = errors.UnwrapOnce(err) {
Interesting, I would've expected there to be some sugar for this operation. Why not errors.If
? Hmm, looks like that's just more and more awkward code. Ok, nevermind, this is good.
docs/RFCS/20190318_error_handling.md, line 1045 at r6 (raw file):
For this we can work as follows: - add a new wrapper type `withLogTags{cause error, tags ...}`. Make
Generally this types should have error
in their name somewhere, don't you think? I.e. withLogTagsError
or something like that. Or did you deliberately not do that because these are just wrapper errors that don't contain a message?
docs/RFCS/20190318_error_handling.md, line 1056 at r6 (raw file):
- ensure that `withLogTags{}` implements the `Format` method so that the logging tags are included when the entire error chain is rendered via `%+v`.
The tags would also be printed from .Error()
, right? Would be nice to have them in the logs. I think this is what you were planning, but maybe make it explicit somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a lot of cooks in this kitchen, so I didn't do a line-by-line review. Approach this is extraordinary work
Reviewable status: complete! 3 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @danhhz, @jordanlewis, @knz, @nvanbenschoten, and @vivekmenezes)
514c377
to
ddd3bf4
Compare
106b1d7
to
02cab13
Compare
37121: errors: introduce a general-purpose modular error library with good properties r=knz a=knz This PR introduces an error library implemented after the principles laid out in #36987. It subsumes the following two PRs: - #38127 *: demonstrate uses of the errors library - review approval: #38127 (comment) - #37813 sql, *: simplify remove dependencies on pgerror, util/log - review approval: #37813 (comment) Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
Release note: None
completed in #37121 bors r+ |
36987: rfcs: proposal for error handling r=knz a=knz RFC text: https://github.com/knz/cockroach/blob/20190421-errors-rfc/docs/RFCS/20190318_error_handling.md This RFC explains how our requirements for error handling have grown over time and how the various code patterns currently in use in CockroachDB are inadequate. It then proposes a new library of error types. This library is compatible with the error interface, including the upcoming Go 2 semantics. Co-authored-by: Raphael 'kena' Poss <[email protected]>
Build succeeded |
RFC text: https://github.com/knz/cockroach/blob/20190421-errors-rfc/docs/RFCS/20190318_error_handling.md
This RFC explains how our requirements for error handling have grown over time and how the various code patterns currently in use in CockroachDB are inadequate.
It then proposes a new library of error types. This library is compatible with the error interface, including the upcoming Go 2 semantics.