Skip to content

Commit

Permalink
amazon: add custom retryer to retry on read: connection reset
Browse files Browse the repository at this point in the history
This change implements a custom retryer that we use when
initializing a new s3 client for interaction with the external
storage sink. This change was motivated by the increased number
of backup job failures we were observing with a `read: connection reset`
error being thrown by s3.

A read connection reset error is thrown when the SDK is unable to read
the response of an underlying API request due to a connection reset. The
DefaultRetryer in the AWS SDK does not treat this error as a retryable error
since the SDK does not have knowledge about the idempotence of the request,
and whether it is safe to retry -
aws/aws-sdk-go#2926 (comment).
In CRDB all operations with s3 (read, write, list) are considered idempotent,
and so we can treat the read connection reset error as retryable too.

Release note (bug fix): Retry s3 operations when they error out with a
read connection reset error instead of failing the top level job.
  • Loading branch information
adityamaru committed Jun 28, 2022
1 parent e19d98d commit e41f20b
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 0 deletions.
2 changes: 2 additions & 0 deletions pkg/cloud/amazon/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@ go_library(
"//pkg/util/tracing",
"@com_github_aws_aws_sdk_go//aws",
"@com_github_aws_aws_sdk_go//aws/awserr",
"@com_github_aws_aws_sdk_go//aws/client",
"@com_github_aws_aws_sdk_go//aws/credentials",
"@com_github_aws_aws_sdk_go//aws/credentials/stscreds",
"@com_github_aws_aws_sdk_go//aws/request",
"@com_github_aws_aws_sdk_go//aws/session",
"@com_github_aws_aws_sdk_go//service/kms",
"@com_github_aws_aws_sdk_go//service/s3",
Expand Down
42 changes: 42 additions & 0 deletions pkg/cloud/amazon/s3_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ import (

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/client"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/credentials/stscreds"
"github.com/aws/aws-sdk-go/aws/request"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
Expand Down Expand Up @@ -86,6 +88,39 @@ type s3Storage struct {
cached *s3Client
}

var _ request.Retryer = &customRetryer{}

// customRetryer implements the `request.Retryer` interface and allows for
// customization of the retry behaviour of an AWS client.
type customRetryer struct {
client.DefaultRetryer
}

// isErrReadConnectionReset returns true if the underlying error is a read
// connection reset error.
//
// NB: A read connection reset error is thrown when the SDK is unable to read
// the response of an underlying API request due to a connection reset. The
// DefaultRetryer in the AWS SDK does not treat this error as a retryable error
// since the SDK does not have knowledge about the idempotence of the request,
// and whether it is safe to retry -
// https://github.com/aws/aws-sdk-go/pull/2926#issuecomment-553637658.
//
// In CRDB all operations with s3 (read, write, list) are considered idempotent,
// and so we can treat the read connection reset error as retryable too.
func isErrReadConnectionReset(err error) bool {
// The error string must match the one in
// github.com/aws/aws-sdk-go/aws/request/connection_reset_error.go. This is
// unfortunate but the only solution until the SDK exposes a specialized error
// code or type for this class of errors.
return err != nil && strings.Contains(err.Error(), "read: connection reset")
}

// ShouldRetry implements the request.Retryer interface.
func (sr *customRetryer) ShouldRetry(r *request.Request) bool {
return sr.DefaultRetryer.ShouldRetry(r) || isErrReadConnectionReset(r.Error)
}

// s3Client wraps an SDK client and uploader for a given session.
type s3Client struct {
client *s3.S3
Expand Down Expand Up @@ -359,6 +394,13 @@ func newClient(
opts.Config.LogLevel = aws.LogLevel(aws.LogDebugWithRequestRetries | aws.LogDebugWithRequestErrors)
}

retryer := &customRetryer{
DefaultRetryer: client.DefaultRetryer{
NumMaxRetries: *opts.Config.MaxRetries,
},
}
opts.Config.Retryer = retryer

var sess *session.Session
var err error

Expand Down

0 comments on commit e41f20b

Please sign in to comment.