-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud/gcp: add custom retryer for gcs storage, retry on stream INTERNAL_ERROR #85024
Conversation
14d7dcf
to
f4f6450
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this'll solve a bunch of failing roachtests 🤞 LGTM, it's a bummer they haven't exposed ShouldRetry
yet but keeping the logic in a different file is a good idea.
} | ||
} | ||
|
||
if e := (errors.Wrapper)(nil); errors.As(err, &e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct but is it more intuitive to write if ok := e.(errors.Wrapper); ok { ... }
? @knz is there a more canonical way to check if an error is wrapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I had to write it like this because of a checker that we have. This pattern is the suggested one:
https://github.com/cockroachdb/cockroach/blob/master/pkg/testutils/lint/passes/errcmp/errcmp.go#L97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH! never mind me then 👍
Before merging can you also add all the failing roachtests on master as |
if defaultShouldRetry(err) { | ||
return true | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a small comment on why these specific cases are outside google's default retries, and why we retry them? Linking the google sdk issues is probably sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
52cca37
to
4b84443
Compare
…AL_ERROR Currently, errors like `stream error: stream ID <x>; INTERNAL_ERROR; received from peer` are not being retried. Create a custom retryer to retry these errors as suggested by: googleapis/google-cloud-go#3735 googleapis/google-cloud-go#784 Fixes: cockroachdb#85217, cockroachdb#85216, cockroachdb#85204, cockroachdb#84162 Release note: None
bors r+ |
Build succeeded: |
blathers backport 22.1 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from cb92673 to blathers/backport-release-22.1-85024: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
…o v1.21.0 This commit bumps the `cloud.google.com/go/storage` vendor to include the ability to inject custom retry functions when reading and writing from the underlying SDK - https://pkg.go.dev/cloud.google.com/go/storage#Client.SetRetry. The motivation for this change is to combat the high rate of restores we are seeing fail due to an internal http2 stream error that is being surfaced by the SDK in our roachtests. As seen in cockroachdb#85024 we would like to wrap the default retry logic with our custom retry handling for this particular error. This is the recommended solution as per: googleapis/google-cloud-go#3735 googleapis/google-cloud-go#784 Note, the dependencies have been bumped to the version that we have been running on master since the 22.1 branch was cut. Release note (general change): bump cloud.google.com/go/storage from v18.2.0 to v1.21.0 to allow for injection of custom retry logic in the SDK
…o v1.21.0 This commit bumps the `cloud.google.com/go/storage` vendor to include the ability to inject custom retry functions when reading and writing from the underlying SDK - https://pkg.go.dev/cloud.google.com/go/storage#Client.SetRetry. The motivation for this change is to combat the high rate of restores we are seeing fail due to an internal http2 stream error that is being surfaced by the SDK in our roachtests. As seen in cockroachdb#85024 we would like to wrap the default retry logic with our custom retry handling for this particular error. This is the recommended solution as per: googleapis/google-cloud-go#3735 googleapis/google-cloud-go#784 Note, the dependencies have been bumped to the version that we have been running on master since the 22.1 branch was cut. Release note (general change): bump cloud.google.com/go/storage from v18.2.0 to v1.21.0 to allow for injection of custom retry logic in the SDK
Currently, errors like
stream error: stream ID <x>; INTERNAL_ERROR; received from peer
are not being retried. Create a custom retryer to retry these errors as
suggested by:
googleapis/google-cloud-go#3735
googleapis/google-cloud-go#784
Fixes: #85217, #85216, #85204, #84162
Release note: None