-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multi-cause errors in go 1.20 #112
support multi-cause errors in go 1.20 #112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
The main shortcoming I see in the current approach is that you strip the "shell" go type of a multierror when encoding.
It should be preserved.
This is examplified by the following additional unit tests (didn't touch your code, just added tests): dhartunian#1
Reviewed 1 of 16 files at r1.
Reviewable status: 1 of 16 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another desirable property of the library we should work to preserve:
- I encode a multierror on node n1, version X+1, built with go 1.20 and supports multierror. The error object is built via
errors.Join(A, io.EOF)
- I decode the error on node n2, version X, built with go 1.19 and doesn't support multierror
- I re-encode the error on node n2, version X, built with go 1.19
- I decode the error on node n3, version X+1, built with go 1.20
- in that case I want that on n3 node after decoding
errors.Is(myErr, io.EOF)
returns true.
There are tests that exercise similar thing for non-multierrors in file errbase/migrations_test.go, scenarios 4 and 5.
Reviewable status: 1 of 16 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be also a design problem with no straightforward answer, here:
- I encode a multierror on node n1, version X+1, built with go 1.20 and supports multierror. The error object is built via
errors.Join(A, io.EOF)
(or with the opposite order) - I decode the error on node n2, version X, built with go 1.19 and doesn't support multierror
- on n2, I run
errors.Is(myErr, io.EOF)
What result should I see?
From a dev UX perspective, we'd really want to observe "true" here. But we can't really. (and I won't push for us to make it work. It'd be unreasonable)
Now, what does this mean in e.g. CockroachDB?
Say, I have a cluster currently going through an upgrade from v23.1 to v23.2. There's multiple versions side-by-side. Only the v23.2 code knows about multierror.
Now I run a SQL client against the v23.1 node. Its SQL query, through distSQL, goes to do some work on a v23.2 node. Then an error happens on the v23.2 node. Say it's a KV retriable error.
Now imagine what happens if that KV error is wrapped with errors.Join
.
Now, the encoded result is sent back to the v23.1 node, which can't do anything valuable with it.
Then the test for the "retriable" property of the KV error fails - that cause is invisible.
This results in a user-visible bug in CockroachDB.
(Sad face emoji 😭 )
How can we avoid this? I am scratching my head very hard about this.
One thing I can think of is to ban all uses of errors.Join
, fmt.Errorf
with multiple %w
causes and any middleware that may use go 1.20-style multierrors in v23.2, and wait until v23.3 (or v24.1) to begin using them. By then, all the previous-version nodes will be running the new version of the errors lib and will be able to handle the multierrors.
Reviewable status: 1 of 16 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative to the latter point is to create a version of the errors lib that doesn't require go 1.20 to build, but recognizes multierrors when decoding.
I.e. make the current verison of the library implement its own notion of multierror w/ an API fully compatible with go 1.20
Then ship it in crdb 23.1 with a backport.
This way by the time 23.2 comes around, the 23.1 code will be able to decode the multierrors and make sense of them.
Reviewable status: 1 of 16 files reviewed, all discussions resolved
This makes me think that the protobuf encoding may have no choice but to stuff multierrors in the existing LeafEncodedError type somehow. Perhaps with additional protobuf fields. Otherwise, when the |
@knz this is tricky and I want to check with you before going down some weird reflection path...the Furthermore, we currently don't support the |
Could you explain a bit more this point? |
If I modify your test example to look like this (single
I still get an error on
That case has been supported in the go stdlib since before 1.20 and it constructs a single-cause |
aw now i understand. that looks like a baseline bug (independent from the current project). |
I'm fine with including the fix as part of this work since it's certainly related. Let me see if I can write a POC implementation of |
The part that surprises me is that it's already supposed to work. The code is there ( |
It looks like an encoding problem. The code tries to extract a "prefix" from the wrapper and writes a blank one in this case which breaks the equality check on I wonder if the prefix should be set to the full interpolated string from the wrapper, and then the inner messages should be ignored in the case where the wrapper original type is |
Just pushed a new commit on top of your tests @knz. Everything now passes! I think there are a few rough edges to clean up but I had missed a few things about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is progress! I like it.
I do think however it would help the work here to extract the fix for the single-case %w
into a separate commit, independent from the rest of the multierror work.
Reviewed 1 of 5 files at r3.
Reviewable status: 2 of 18 files reviewed, 1 unresolved discussion (waiting on @dhartunian)
errbase/opaque.go
line 71 at r3 (raw file):
return e.cause.Error() } // TODO(davidh): cleaner way to do this?
Let's think about this differently, also independently from wrapError
/ %w
. I think you have discovered a design shortcoming.
The way this code works (and it is paired together with the extractPrefix
function) is to support these two possible wrapper definitions:
type myWrapper1 struct { cause error }
func (e *myWrapper1) Error() string { return "prefix: " + e.cause.Error() }
func (e *myWrapper1) Unwrap() error { return e.cause }
...
type myWrapper2 struct { cause error }
func (e *myWrapper2) Error() string { return "completely unrelated" }
func (e *myWrapper2) Unwrap() error { return e.cause }
...
err := &myWrapper1{errors.New("boo"))
err2 := decode(encode(err))
assert(err.Error() == err2.Error())
err = &myWrapper2{errors.New("boo"))
err2 = decode(encode(err))
assert(err.Error() == err2.Error())
i.e. the prefix
field is only populated is it was, in fact, a prefix of the cause with a colon.
If extractPrefix
is unable to find a prefix, it assumes the wrapper carries no error string of its own and fully delegates the responsibility for the message to the cause.
This was a reasonable choice because at the time most wrapper types were like that.
(I'm not sure if there is a test case for this already, if there is not we should make one.)
Now the problem you have discovered here, is this case:
type myWrapper3 struct { cause error }
func (e *myWrapper3) Error() string { return e.Error() + ": unrelated" }
func (e *myWrapper3) Unwrap() error { return e.cause }
...
err := &myWrapper3{errors.New("boo"))
err2 := decode(encode(err))
assert(err.Error() == err2.Error())
This, currently, fails. err2.Error()
returns boo
and not boo: unrelated
.
That's because the logic currently simply "forgets" the part of the error string "owned" by the wrapper during encoding -- the extractPrefix
function and the opaque wrapper type are "too simple".
IMHO we should also extract the fix for this to a different commit. (I think this might be the same as the fix for %w
but I'm not sure). The problem needs to be solved for any custom wrapper type, not just the type returned by fmt.Errorf
.
I think the way forward here is to extend EncodedWrapper
and opaqueWrapper
with some extra metadata that explains to the Error()
method how to build the result string.
Maybe you have a different idea?
If I were to do it with that idea I would keep it simple and use just a boolean, wrapper_owns_error_string
so that Error()
would work like that:
func (e *opaqueWrapper) Error() string {
if e.wrapper_owns_error_string { return e.prefix /* may be empty - that's ok */ }
if e.prefix == "" { return e.cause.Error() }
return fmt.Sprintf("%s: %s", e.prefix, e.cause)
}
Regardless of the solution, we'll also need to think a bit about what happens in mixed-version deployments.
68f6a54
to
90962d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at the new version after the rebase. See my comments below; I would also be interested in your reaction to my earlier review comments from two weeks ago.
Reviewed 1 of 35 files at r4, 26 of 27 files at r5, all commit messages.
Reviewable status: 27 of 42 files reviewed, 10 unresolved discussions (waiting on @dhartunian)
errbase/decode.go
line 32 at r5 (raw file):
return decodeWrapper(ctx, w) } if w := enc.GetMultiWrapper(); w != nil {
As I have written previously, we may be stuck by cross-version compatibility to make the multierrors encoded using the pre-existing Wrapper protobuf (otherwise a previous-version DecodeError would panic not knowing what to do with the new type).
errbase/encode.go
line 185 at r5 (raw file):
} else { // No encoder. // In that case, we assume parent error owns error string
I would still encourage you to use the encodePrefix function here.
errbase/opaque.go
line 79 at r5 (raw file):
} // TODO(davidh): probably shouldn't create a join obj here
Correct. You shouldn't need it.
errbase/opaque.go
line 161 at r5 (raw file):
} } return nil
Something about making this return conditional.
errbase/unwrap.go
line 31 at r5 (raw file):
// Go 2 error proposal). // // UnwrapOnce treats multi-errors (those implementing the
In addition to this paragraph, maybe refer to the std lib doc that errors.Unwrap is meant to return nil for multierrors anyway.
errorspb/errors.proto
line 91 at r5 (raw file):
// EncodedWrapperMulti is the wire-encodable representation // of an error wrapper with multiple causes. message EncodedWrapperMulti {
As discussed previously - i'm not sure we can afford a new protobuf message type.
errutil/as.go
line 63 at r5 (raw file):
} if me, ok := err.(interface{ Unwrap() []error }); ok {
maybe for _, e := range UnwrapMulti(err)
?
errutil/as_test.go
line 64 at r5 (raw file):
// Check that it works even if hidden in multi-error multiWrapErr = fmt.Errorf("error: %w and %w", errors.Wrap(refwErr, "hidden"), errors.New("world"))
Pls also add tests with custom multierror types, not just that from fmt
.
markers/markers.go
line 72 at r5 (raw file):
// Recursively try multi-error causes, if applicable. if me, ok := err.(interface{ Unwrap() []error }); ok {
Maybe use errbase.UnwrapMulti
.
9ad03c5
to
8514096
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 45 files reviewed, 10 unresolved discussions (waiting on @knz)
errbase/decode.go
line 32 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
As I have written previously, we may be stuck by cross-version compatibility to make the multierrors encoded using the pre-existing Wrapper protobuf (otherwise a previous-version DecodeError would panic not knowing what to do with the new type).
Done. using EncodedLeaf now.
errbase/encode.go
line 185 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I would still encourage you to use the encodePrefix function here.
Now that the encoding is in encodeLeaf
, we always take the entire .Error()
output. I still believe in that approach because multi-cause errors will never have a "prefix".
errbase/opaque.go
line 71 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Let's think about this differently, also independently from
wrapError
/%w
. I think you have discovered a design shortcoming.The way this code works (and it is paired together with the
extractPrefix
function) is to support these two possible wrapper definitions:type myWrapper1 struct { cause error } func (e *myWrapper1) Error() string { return "prefix: " + e.cause.Error() } func (e *myWrapper1) Unwrap() error { return e.cause } ... type myWrapper2 struct { cause error } func (e *myWrapper2) Error() string { return "completely unrelated" } func (e *myWrapper2) Unwrap() error { return e.cause } ... err := &myWrapper1{errors.New("boo")) err2 := decode(encode(err)) assert(err.Error() == err2.Error()) err = &myWrapper2{errors.New("boo")) err2 = decode(encode(err)) assert(err.Error() == err2.Error())i.e. the
prefix
field is only populated is it was, in fact, a prefix of the cause with a colon.
IfextractPrefix
is unable to find a prefix, it assumes the wrapper carries no error string of its own and fully delegates the responsibility for the message to the cause.
This was a reasonable choice because at the time most wrapper types were like that.(I'm not sure if there is a test case for this already, if there is not we should make one.)
Now the problem you have discovered here, is this case:
type myWrapper3 struct { cause error } func (e *myWrapper3) Error() string { return e.Error() + ": unrelated" } func (e *myWrapper3) Unwrap() error { return e.cause } ... err := &myWrapper3{errors.New("boo")) err2 := decode(encode(err)) assert(err.Error() == err2.Error())This, currently, fails.
err2.Error()
returnsboo
and notboo: unrelated
.
That's because the logic currently simply "forgets" the part of the error string "owned" by the wrapper during encoding -- theextractPrefix
function and the opaque wrapper type are "too simple".IMHO we should also extract the fix for this to a different commit. (I think this might be the same as the fix for
%w
but I'm not sure). The problem needs to be solved for any custom wrapper type, not just the type returned byfmt.Errorf
.I think the way forward here is to extend
EncodedWrapper
andopaqueWrapper
with some extra metadata that explains to theError()
method how to build the result string.
Maybe you have a different idea?If I were to do it with that idea I would keep it simple and use just a boolean,
wrapper_owns_error_string
so thatError()
would work like that:func (e *opaqueWrapper) Error() string { if e.wrapper_owns_error_string { return e.prefix /* may be empty - that's ok */ } if e.prefix == "" { return e.cause.Error() } return fmt.Sprintf("%s: %s", e.prefix, e.cause) }Regardless of the solution, we'll also need to think a bit about what happens in mixed-version deployments.
I think the extractPrefix
problem is taken care of by the earlier PR.
errbase/opaque.go
line 79 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Correct. You shouldn't need it.
Code removed, no longer relevant.
errbase/opaque.go
line 161 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Something about making this return conditional.
This is no longer relevant because we output using the method on *opaqueLeaf
.
errbase/unwrap.go
line 31 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
In addition to this paragraph, maybe refer to the std lib doc that errors.Unwrap is meant to return nil for multierrors anyway.
Done.
errorspb/errors.proto
line 91 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
As discussed previously - i'm not sure we can afford a new protobuf message type.
Done.
errutil/as.go
line 63 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
maybe
for _, e := range UnwrapMulti(err)
?
Done.
errutil/as_test.go
line 64 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Pls also add tests with custom multierror types, not just that from
fmt
.
done. this was helpful in finding a bug in the As
implementation where only a top-level multi-cause chain would get explored.
markers/markers.go
line 72 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Maybe use
errbase.UnwrapMulti
.
Done. This also needed to be moved into the loop above to ensure that it's not just attempted at the top level. Added tests to check that path. Same with the Is
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good. Just a few nits and a hiccup with the (internal) opaque types. This should be quick to resolve.
Reviewed 7 of 36 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: 17 of 45 files reviewed, 4 unresolved discussions (waiting on @dhartunian)
errbase/encode.go
line 40 at r6 (raw file):
} // Not a causer. return encodeLeaf(ctx, err, nil)
simpler: return encodeLeaf(ctx, err, UnwrapMulti(err))
errbase/encode.go
line 80 at r7 (raw file):
} cs := make([]*EncodedError, len(causes))
nit:
var cs []*EncodedError
if len(causes) > 0 { ... }
this way we avoid the heap allocation of a 0-sized slice via make
in the common case there's no multi-cause.
errbase/encode.go
line 133 at r7 (raw file):
msg, ownError = extractPrefix(err, cause) details = e.details ownError = e.ownsErrorString
why is ownError
assigned two times here?
Maybe extractPrefix
doesn't need to be called anymore in this case? (unsure)
errbase/opaque.go
line 78 at r7 (raw file):
// opaque leaf can be a multi-wrapper func (e *opaqueLeaf) Unwrap() []error { return e.causes }
This sadly is a problem waiting to happen. Even though this library is OK with error types that announce multiple causes but provide none, it's not the expected semantics of the go API. I think it's a problem for other error handling code if this can return zero causes.
This tells me overall we may be best served with two different opaque types. One for real leafs and one for multiwrappers.
(But use just 1 protobuf message type for both)
It's possible that the implementation of the one can use the other so you can have the methods implemened just once for both (type opaqueMultiWrapper struct { opaqueLeaf }
then just add an Unwrap
method to it).
8514096
to
36fd263
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 14 of 46 files reviewed, 3 unresolved discussions (waiting on @knz)
errbase/encode.go
line 40 at r6 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
simpler:
return encodeLeaf(ctx, err, UnwrapMulti(err))
done.
errbase/encode.go
line 80 at r7 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit:
var cs []*EncodedError if len(causes) > 0 { ... }this way we avoid the heap allocation of a 0-sized slice via
make
in the common case there's no multi-cause.
Done.
errbase/encode.go
line 133 at r7 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
why is
ownError
assigned two times here?
MaybeextractPrefix
doesn't need to be called anymore in this case? (unsure)
Multiple assignment removed. This is clarified upon rebase of earlier commit in #113.
I think we do still need to call extractPrefix
in this case because if the opaqueWrapper
does not set ownsErrorString
to true we need to infer the value of ownError
by attempting to extract the prefix.
errbase/opaque.go
line 78 at r7 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
This sadly is a problem waiting to happen. Even though this library is OK with error types that announce multiple causes but provide none, it's not the expected semantics of the go API. I think it's a problem for other error handling code if this can return zero causes.
This tells me overall we may be best served with two different opaque types. One for real leafs and one for multiwrappers.
(But use just 1 protobuf message type for both)It's possible that the implementation of the one can use the other so you can have the methods implemened just once for both (
type opaqueMultiWrapper struct { opaqueLeaf }
then just add anUnwrap
method to it).
Ah good catch. Thanks. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no further comment here. 💯
I'll wait until we know where #113 lands before approving this.
Reviewed 1 of 16 files at r1, 16 of 36 files at r6, 15 of 15 files at r8, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dhartunian)
errbase/encode.go
line 133 at r7 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Multiple assignment removed. This is clarified upon rebase of earlier commit in #113.
I think we do still need to call
extractPrefix
in this case because if theopaqueWrapper
does not setownsErrorString
to true we need to infer the value ofownError
by attempting to extract the prefix.
Let's continue to iterate on this in #113.
One thing i would re-iterate is that i'd like us to have a variant of this PR (and #113) on a branch that's compatible with go 1.18, which we can then import in crdb 23.1. |
Agreed. My thinking was that once we've merged the outstanding PRs for 1.20, I'll backport into a 1.18 PR a version that omits the code that requires 1.20, probably just tests that use the new APIs. Seemed like that would be easiest to manage. |
Yes, good thinking. |
36fd263
to
6e367c3
Compare
I forgot to mention - we also need this patch to include the various new exported APIs from the std "errors" package. This includes |
perhaps include PR #106 in here. |
293a3c8
to
fe9cc0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 21 of 21 files at r10, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dhartunian)
errorspb/errors.proto
line 30 at r10 (raw file):
EncodedErrorDetails details = 2 [(gogoproto.nullable) = false]; // causes is a list of errors that contain the causal tree
For the sake of future readers of this code:
It's a little bit confusing here to see a causes
field where one would naively expect that only EncodedErrorWrapper could possible have causes.
Two suggestions:
- acknowledge this confusion in an explanatory comment.
- possibly (your call) rename
causes
tomultierror_causes
to distinguish it from the "simple" cause in EncodedErrorWrapper.
fe9cc0f
to
d48b551
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 43 of 47 files reviewed, 1 unresolved discussion (waiting on @knz)
errorspb/errors.proto
line 30 at r10 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
For the sake of future readers of this code:
It's a little bit confusing here to see acauses
field where one would naively expect that only EncodedErrorWrapper could possible have causes.Two suggestions:
- acknowledge this confusion in an explanatory comment.
- possibly (your call) rename
causes
tomultierror_causes
to distinguish it from the "simple" cause in EncodedErrorWrapper.
done.
Go 1.20 introduces the idea of an error with multiple causes instead of a single chain. This commit updates the errors library to properly encode, decode, and format these error types. For encoding and decoding we use the existing `EncodedLeaf` type and embellish it with a `causes` field. This is done in order to keep the encoding/decoding backwards compatible. `EncodedLeaf` types containing multiple causes when decided by earlier versions will simply see an opaque leaf with a message inside. The reason the `EncodedWrapper` is not used here is because the wrapper already contains a mandatory single `cause` field that we cannot fill with the multi-errors. A new type cannot be used because it would not be decodable by older versions of this library.
d48b551
to
8a8366e
Compare
Added build-and-test jobs on |
Go 1.20 introduces the idea of an error with multiple causes instead
of a single chain. This commit updates the errors library to properly
encode, decode, and format these error types.
For encoding and decoding we use the existing
EncodedLeaf
type andembellish it with a
causes
field. This is done in order to keep theencoding/decoding backwards compatible.
EncodedLeaf
types containingmultiple causes when decided by earlier versions will simply see an
opaque leaf with a message inside. The reason the
EncodedWrapper
is not used here is because the wrapper already contains a mandatory
single
cause
field that we cannot fill with the multi-errors. Anew type cannot be used because it would not be decodable by older
versions of this library.
Note for reviewers: First commit is from #113
This change is