Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachpb: improve AmbiguousResultError #65714

Closed
wants to merge 2 commits into from

Conversation

tbg
Copy link
Member

@tbg tbg commented May 26, 2021

This error type held on to a string message, which is unsuitable for
redaction. In practice, this error opaquely wraps an error, so make this
transparent, which we can now do thanks to cockroachdb/errors.

It turned out that the proto already had an unused field that was (once,
but no longer) used in testing, so we can use that in production now.

I chose to print the root error's file:line in the error as we typically
don't print errors via %+v which is otherwise how these would be
exposed. I expect that to be mildly useful to diagnose problems such as
#65701.

I think there is room for a more general mechanism that exposes the root
cause's location in general, though I am unsure how to best introduce it
given that we usually observe errors through SQL, and on many possible
paths especially for rare flakes such as the above, so no steps towards
that are taken at this point.

Release note: None

@tbg tbg requested review from a team, pbardea and knz and removed request for a team May 26, 2021 13:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg removed the request for review from pbardea May 26, 2021 13:15
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):

	s.Print(cause)
	if f, l, _, ok := errors.GetOneLineSource(cause); ok {
		s.Printf("; originated at %s:%d", redact.SafeString(f), redact.Safe(l))

This is the sort of line that I keep fighting with @knz about, so I'm curious if he likes it (I do).
Rafa, would you say that errors.GetOneLineSource() should return a SafeString and an int type that implements SafeValue instread of (string, int) ?

@tbg tbg force-pushed the ambiguous-improved branch from e1eaa40 to f45ec2c Compare May 26, 2021 13:26
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @tbg)


pkg/ccl/backupccl/restore_data_processor_test.go, line 246 at r1 (raw file):

					return nil
				}
				return roachpb.NewError(roachpb.NewAmbiguousResultErrorf(strconv.Itoa(int(r))))

Errorf("%v", int(r))

Generally using a computed string as format argument is bad style, and I think we have a linter against it now.


pkg/roachpb/errors.go, line 656 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

This is the sort of line that I keep fighting with @knz about, so I'm curious if he likes it (I do).
Rafa, would you say that errors.GetOneLineSource() should return a SafeString and an int type that implements SafeValue instread of (string, int) ?

This entire SafeFormat method makes me sad, but I'm also realizing this is not explained well enough elsewhere.

The logic to format and print errors is best done using a different interface, the errors.SafeFormatter interface (not redact.SafeFormatter). It has a different structure (because also a different purpose).

This means a SafeFormatError method, and next to it for backward-compat with the fmt package a method that redirects Format to SafeFormatError like this:

func (e *AmbiguousResultError) Format(s fmt.State, verb rune)    { errors.FormatError(e, s, verb) }

I can help you build the SafeFormatError method but tehre are a few examples in the errors library already and perhaps in cockroachdb (and we'll need to document it.)

@tbg tbg requested review from andreimatei and knz May 26, 2021 13:33
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):

Previously, knz (kena) wrote…

This entire SafeFormat method makes me sad, but I'm also realizing this is not explained well enough elsewhere.

The logic to format and print errors is best done using a different interface, the errors.SafeFormatter interface (not redact.SafeFormatter). It has a different structure (because also a different purpose).

This means a SafeFormatError method, and next to it for backward-compat with the fmt package a method that redirects Format to SafeFormatError like this:

func (e *AmbiguousResultError) Format(s fmt.State, verb rune)    { errors.FormatError(e, s, verb) }

I can help you build the SafeFormatError method but tehre are a few examples in the errors library already and perhaps in cockroachdb (and we'll need to document it.)

I'll give this a try, thanks Raphael. Will ping when done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'll give this a try, thanks Raphael. Will ping when done.

we can also sit together and I could watch over your shoulder, would help me write a doc on how to do it for other folk afterwards.

tbg added 2 commits May 31, 2021 11:18
This error type held on to a string message, which is unsuitable for
redaction. In practice, this error opaquely wraps an error, so make this
transparent, which we can now do thanks to `cockroachdb/errors`.

It turned out that the proto already had an unused field that was (once,
but no longer) used in testing, so we can use that in production now.

I chose to print the root error's file:line in the error as we typically
don't print errors via `%+v` which is otherwise how these would be
exposed. I expect that to be mildly useful to diagnose problems such as
\cockroachdb#65701.

I think there is room for a more general mechanism that exposes the root
cause's location in general, though I am unsure how to best introduce it
given that we usually observe errors through SQL, and on many possible
paths especially for rare flakes such as the above, so no steps towards
that are taken at this point.

Release note: None
This is more idiomatic.

Release note: None
@tbg tbg force-pushed the ambiguous-improved branch from f45ec2c to a20f810 Compare May 31, 2021 09:34
@tbg tbg requested a review from knz May 31, 2021 09:38
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):

Previously, knz (kena) wrote…

we can also sit together and I could watch over your shoulder, would help me write a doc on how to do it for other folk afterwards.

I tried this just now. I think it works mostly fine, but I'm confused that the wrapped errors in my unit test are no longer printed. This isn't what I had expected, though maybe it's the right behavior.
I recorded myself making this change in case it's helpful for doc writing: https://drive.google.com/file/d/16AAq38A5KeqShEAq5tQ08o4_cNByMG5G/view?usp=sharing (unfortunately the quality is heinous...)
Honestly just looking at an example of how this was done was probably all I needed.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I tried this just now. I think it works mostly fine, but I'm confused that the wrapped errors in my unit test are no longer printed. This isn't what I had expected, though maybe it's the right behavior.
I recorded myself making this change in case it's helpful for doc writing: https://drive.google.com/file/d/16AAq38A5KeqShEAq5tQ08o4_cNByMG5G/view?usp=sharing (unfortunately the quality is heinous...)
Honestly just looking at an example of how this was done was probably all I needed.

  1. why doesn't AmbiguousResultError not behave as a regular error wrapper? i.e. implement an Unwrap and let that return the cause, and let the error formatting code do its thing?

  2. if this was a proper wrapper, you wouldn't need to useGetOneLineSource at all here (and bother printing the "originated" thing). That comes for free when the full error is printed / reported to sentry.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):

Previously, knz (kena) wrote…
  1. why doesn't AmbiguousResultError not behave as a regular error wrapper? i.e. implement an Unwrap and let that return the cause, and let the error formatting code do its thing?

  2. if this was a proper wrapper, you wouldn't need to useGetOneLineSource at all here (and bother printing the "originated" thing). That comes for free when the full error is printed / reported to sentry.

  1. AmbiguousResultError is intentionally opaquely wrapping the error. For example, if, hypothetically, an error marked as retriable gets wrapped in an AmbiguousResultError, we need to make sure that nobody "unwraps" past the ambiguous result with something like errors.Is(ambErr, someMarkerErr). At the same time, AmbiguousResultError is also an ErrorDetail, so we can't just by convention wrap it with errors.Opaque (at least not with the way we currently do *roachpb.Error, I think). Hmm. Is the right thing to make sure that we populate the e.EncodedErr field with an barriers.Handled(err) when we populate it? Then the ambiguous result would unwrap to barriers.Handled(err) which is fine. But not sure what the point of that is.

re 2:, I'm not sure I understand how this all works. Look at this example:

type myErr struct {
	cause error
}

func (err *myErr) Unwrap() error {
	return err.cause
}

func (err *myErr) Error() string {
	return "myErr"
}

func TestFoo(t *testing.T) {
	err := &myErr{
		cause: &myErr{
			cause: errors.New("i'm ambiguous"),
		},
	}
	t.Error(err)
	t.Errorf("%+v", err)
	t.Errorf("%+v", errors.Wrap(err, "asd"))
}

This prints:

   stopper_test.go:677: myErr
    stopper_test.go:678: myErr
    stopper_test.go:679: asd: i'm ambiguous
        (1) attached stack trace
          -- stack trace:
          | github.com/cockroachdb/cockroach/pkg/util/stop_test.TestFoo
          | 	/home/tobias/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper_test.go:679
          | [...repeated from below...]
        Wraps: (2) asd
        Wraps: (3)
        Wraps: (4)
        Wraps: (5) attached stack trace
          -- stack trace:
          | github.com/cockroachdb/cockroach/pkg/util/stop_test.TestFoo
          | 	/home/tobias/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper_test.go:674
          | testing.tRunner
          | 	/usr/local/go/src/testing/testing.go:1123
          | runtime.goexit
          | 	/usr/local/go/src/runtime/asm_amd64.s:1374
        Wraps: (6) i'm ambiguous
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *stop_test.myErr (4) *stop_test.myErr (5) *withstack.withStack (6) *errutil.leafError

This looks like unless I use a cockroachdb/errors supplied wrapper I don't get any nice formatted output for %+v, which makes sense since I didn't implement any special interface for formatting there.
Besides, I am looking for something here that I can return with the SQL error - when an ambiguous result bubbles up to the client, I would like to, for debugging purposes, log the innermost wrapped error's file:line because without that it's anyone's guess where that came from (and I don't think the output of %+v is concise enough given that it's client-facing). This is similar to our woes around context canceled, where unfortunately the innermost wrapper is a singleton & it's therefore harder to do this.

My hope is that you can tell me what I actually want here, and then I'll go and do that. :-)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)


pkg/roachpb/errors.go, line 656 at r1 (raw file):
Regarding point 1:

AmbiguousResultError is intentionally opaquely wrapping the error.

ok I did not understand this (or perhaps I had forgotten it) and it explains a lot about the remainder of your approach.

Is the right thing to make sure that we populate the e.EncodedErr field with an barriers.Handled(err) when we populate it? Then the ambiguous result would unwrap to barriers.Handled(err) which is fine. But not sure what the point of that is.

I don't know whether it would be "the right thing" but it would ensure that the error "under" the barrier would be rendered fully under %+v.

Regarding point 2:

unless I use a cockroachdb/errors supplied wrapper I don't get any nice formatted output for %+v

you can either implement a FormatError() method as discussed previously, or (e *yourType) Format(f fmt.Printer, r rune) { return errors.FormatError(e, f, r) } and that will get you there (if that is not properly documented, we can talk)

I would like to, for debugging purposes, log the innermost wrapped error's file:line because without that it's anyone's guess where that came from (and I don't think the output of %+v is concise enough given that it's client-facing)

To be honest, this is more or less the first time that I've heard the requirement that we want to include a reference to the source code in the concise user-facing message (i.e. the one returned by Error()).
If that is truly a requirement, then at the same time I hardly see how that requirement would be unique to AmbiguousError - it feels to me that if you have a compelling reason why this is useful in this case, it's likely to also be useful in many other cases.

Then if it's really a requirement, then I also think that the behavior should be consistent across error types, and that the errors library should take ownership for doing this somehow.

Then I'd like to understand what is truly required:

  • does this need to be present in the string returned by Error(), which is going to be embarked in many places besides log messages; or
  • are we looking for a new formatting verb/flag, e.g. %#v or similar, which would print a short reference to the source code in addition to the error message but without details?

Let me know what you think.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Tobi let's get this show back on the road, cause I want it. Actually, what I want is for AmbiguousResultError to implement causer so I can check whether it was caused by ctx deadline exceeded or canceled.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 2032 at r2 (raw file):

			default:
				if ambiguousError != nil {
					return nil, roachpb.NewAmbiguousResultError(errors.Wrap(ambiguousError, "propagate"))

I think the commits got mixed up and this won't compile without the 2nd commit

@tbg
Copy link
Member Author

tbg commented Aug 28, 2021

Mind taking this over @andreimatei? I'm happy to review but there's a pile of triage on my plate right now and I don't want to get sidetracked (even more) with actually writing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants