-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bigtable): Retry on RST_STREAM error #9673
Conversation
Do not merge until release freeze ends mid April |
bigtable/bigtable.go
Outdated
|
||
func (r *bigtableRetryer) Retry(err error) (time.Duration, bool) { | ||
if status.Code(err) == codes.Internal && | ||
strings.Contains(err.Error(), "stream terminated by RST_STREAM") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment with context for why this is a special case, with a link back to this bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
bigtable/bigtable.go
Outdated
|
||
func (r *bigtableRetryer) Retry(err error) (time.Duration, bool) { | ||
if status.Code(err) == codes.Internal && | ||
strings.Contains(err.Error(), "stream terminated by RST_STREAM") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to make this more flexible? Like take in a list and/or regex string describing the retryable error messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bigtable/bigtable.go
Outdated
} | ||
|
||
func (r *bigtableRetryer) Retry(err error) (time.Duration, bool) { | ||
if status.Code(err) == codes.Internal && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's not feasible to retry all InternalErrors?
Have you discussed this with Igor or Mattie? They might have thoughts about what can be retried safely. I wonder if we should be considering this for other clients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igorbernstein2 , this should be done for all other clients, right?
Also, should all internal errors be retried ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked with Igor regarding which errors need to be retried. Shared the document with the team. Will update the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a separate issue to track work on rest of the errors: #10207
Fixes: #6476
Issue: When operation fails with error "stream terminated by RST_STREAM with error code: INTERNAL_ERROR", it is not retried even though spanner client retries on same error.
Fix: Currently, gax.Invoke is retried only on DeadlineExceeded, Unavailable and Aborted. This PR modifies this behavior to retry on Internal error if error message contains "stream terminated by RST_STREAM"