Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud/gcp: add custom retryer for gcs storage, retry on stream INTERNAL_ERROR #85024

Merged
merged 1 commit into from
Jul 29, 2022

Conversation

rhu713
Copy link
Contributor

@rhu713 rhu713 commented Jul 25, 2022

Currently, errors like
stream error: stream ID <x>; INTERNAL_ERROR; received from peer
are not being retried. Create a custom retryer to retry these errors as
suggested by:

googleapis/google-cloud-go#3735
googleapis/google-cloud-go#784

Fixes: #85217, #85216, #85204, #84162

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rhu713 rhu713 force-pushed the stream-err branch 5 times, most recently from 14d7dcf to f4f6450 Compare July 29, 2022 15:10
@rhu713 rhu713 changed the title invoke-debug cloud/gcp: add custom retryer for gcs storage, retry on stream INTERNAL_ERROR Jul 29, 2022
@rhu713 rhu713 marked this pull request as ready for review July 29, 2022 15:11
@rhu713 rhu713 requested a review from a team July 29, 2022 15:11
@rhu713 rhu713 requested a review from a team as a code owner July 29, 2022 15:11
@rhu713 rhu713 requested review from msbutler and adityamaru July 29, 2022 15:11
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this'll solve a bunch of failing roachtests 🤞 LGTM, it's a bummer they haven't exposed ShouldRetry yet but keeping the logic in a different file is a good idea.

}
}

if e := (errors.Wrapper)(nil); errors.As(err, &e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct but is it more intuitive to write if ok := e.(errors.Wrapper); ok { ... }? @knz is there a more canonical way to check if an error is wrapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I had to write it like this because of a checker that we have. This pattern is the suggested one:
https://github.com/cockroachdb/cockroach/blob/master/pkg/testutils/lint/passes/errcmp/errcmp.go#L97

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH! never mind me then 👍

@adityamaru
Copy link
Contributor

Before merging can you also add all the failing roachtests on master as Fixes so they auto-close?

if defaultShouldRetry(err) {
return true
}

Copy link
Collaborator

@msbutler msbutler Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a small comment on why these specific cases are outside google's default retries, and why we retry them? Linking the google sdk issues is probably sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rhu713 rhu713 force-pushed the stream-err branch 2 times, most recently from 52cca37 to 4b84443 Compare July 29, 2022 17:08
…AL_ERROR

Currently, errors like
`stream error: stream ID <x>; INTERNAL_ERROR; received from peer`
are not being retried. Create a custom retryer to retry these errors as
suggested by:

googleapis/google-cloud-go#3735
googleapis/google-cloud-go#784

Fixes: cockroachdb#85217, cockroachdb#85216, cockroachdb#85204, cockroachdb#84162

Release note: None
@rhu713
Copy link
Contributor Author

rhu713 commented Jul 29, 2022

bors r+

@craig craig bot merged commit 7e2df69 into cockroachdb:master Jul 29, 2022
@craig
Copy link
Contributor

craig bot commented Jul 29, 2022

Build succeeded:

@rhu713
Copy link
Contributor Author

rhu713 commented Aug 2, 2022

blathers backport 22.1

@blathers-crl
Copy link

blathers-crl bot commented Aug 2, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from cb92673 to blathers/backport-release-22.1-85024: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

adityamaru added a commit to adityamaru/cockroach that referenced this pull request Aug 8, 2022
…o v1.21.0

This commit bumps the `cloud.google.com/go/storage` vendor to
include the ability to inject custom retry functions when reading
and writing from the underlying SDK -
https://pkg.go.dev/cloud.google.com/go/storage#Client.SetRetry.

The motivation for this change is to combat the high rate of
restores we are seeing fail due to an internal http2 stream error
that is being surfaced by the SDK in our roachtests. As seen in
cockroachdb#85024 we would like
to wrap the default retry logic with our custom retry handling for this
particular error. This is the recommended solution as per:
googleapis/google-cloud-go#3735
googleapis/google-cloud-go#784

Note, the dependencies have been bumped to the version that we have
been running on master since the 22.1 branch was cut.

Release note (general change): bump cloud.google.com/go/storage from
v18.2.0 to v1.21.0 to allow for injection of custom retry logic in the
SDK
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Aug 8, 2022
…o v1.21.0

This commit bumps the `cloud.google.com/go/storage` vendor to
include the ability to inject custom retry functions when reading
and writing from the underlying SDK -
https://pkg.go.dev/cloud.google.com/go/storage#Client.SetRetry.

The motivation for this change is to combat the high rate of
restores we are seeing fail due to an internal http2 stream error
that is being surfaced by the SDK in our roachtests. As seen in cockroachdb#85024
we would like to wrap the default retry logic with our custom retry
handling for this particular error. This is the recommended solution as per:
googleapis/google-cloud-go#3735
googleapis/google-cloud-go#784

Note, the dependencies have been bumped to the version that we have
been running on master since the 22.1 branch was cut.

Release note (general change): bump cloud.google.com/go/storage from
v18.2.0 to v1.21.0 to allow for injection of custom retry logic in the
SDK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: restore2TB/nodes=6/cpus=8/pd-volume=2500GB failed
4 participants