Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: CURLE_RECV_ERROR (56) should retry #983

Closed
oberman opened this issue May 4, 2016 · 7 comments
Closed

Proposal: CURLE_RECV_ERROR (56) should retry #983

oberman opened this issue May 4, 2016 · 7 comments

Comments

@oberman
Copy link

oberman commented May 4, 2016

The root issue: Guzzle considers CURLE_RECV_ERROR to be RequestException instead of a ConnectException which causes the AWS SDK to throw an service exception (like SqsException). An example of Curl debugging when this happens to me:

* Hostname sqs.us-east-1.amazonaws.com was found in DNS cache
*   Trying 54.239.27.144...
* Connected to sqs.us-east-1.amazonaws.com (54.239.27.144) port 443 (#1)
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=queue.amazonaws.com,O="Amazon.com, Inc.",L=Seattle,ST=Washington,C=US
*       start date: Sep 25 00:00:00 2015 GMT
*       expire date: Dec 23 23:59:59 2016 GMT
*       common name: queue.amazonaws.com
*       issuer: CN=Symantec Class 3 Secure Server CA - G4,OU=Symantec Trust Network,O=Symantec Corporation,C=US
> POST XXX HTTP/1.1
Host: sqs.us-east-1.amazonaws.com
Content-Type: application/x-www-form-urlencoded
X-Amz-Date: 20160504T115214Z
Authorization: AWS4-HMAC-SHA256 Credential=[KEY]/20160504/us-east-1/sqs/aws4_request, SignedHeaders=host;x-amz-date, Signature=[SIGNATURE]
User-Agent: aws-sdk-php/3.15.2 GuzzleHttp/6.2.0 curl/7.40.0 PHP/5.5.33
Content-Length: 889

* upload completely sent off: 889 out of 889 bytes
* SSL read: errno -5961 (PR_CONNECT_RESET_ERROR)
* TCP connection reset by peer
* Closing connection 1

It appears that issue #767 was resolved by patching S3Client to retry CURLE_RECV_ERROR (2f985ba). I agree with the patch, but feel any API request with this error should be retried.

@GrahamCampbell
Copy link
Contributor

You can probably setup guzzle 6's retry middleware.

@oberman
Copy link
Author

oberman commented May 4, 2016

While researching my options (which including messing with retry middleware) I found #767. It sounds like this is a "rare but persistent" issue that happens to multiple SDK consumers, so I'm proposing the SDK handle it for everyone by default rather than by custom configuration.

@jeskew
Copy link
Contributor

jeskew commented May 4, 2016

#767 was added because the idempotency characteristics of S3's API are well-defined and well-understood. That's generally not true of other services, where idempotency depends on factors unknown to the SDK when it's executing a command. Whether a ReceiveMessage command is idempotent depends on whether dead letter queues with automatic redrive policies have been configured; GetObject, however, is always idempotent.

Instances of ConnectException are also always idempotent, as Guzzle reserves that exception type for cases when it is unable to establish any connection with the remote server. CURL_RECV_ERROR, however, is used when an error occurs while receiving a response. If you encounter this error while putting a message in an SQS queue, you cannot know if your message was received and fully processed.

@oberman
Copy link
Author

oberman commented May 4, 2016

This is great information! I agree it's clearly not appropriate for the SDK to blindly retry CURL_RECV_ERROR.

I'm trying to figure out if there is a generic way of detecting which exceptions to retry if I know the call is idempotent. I see that AwsException provides getAwsErrorType(). I think the possible values are null, "client" and "server" (strings) and I'd want to retry "server". What about null? It's hard to puzzle through from the code but I think it's null on connection errors. I'm not sure what other conditions lead to null though...

@jeskew
Copy link
Contributor

jeskew commented May 4, 2016

null would be encountered if there is no response from the server. type is calculated from the HTTP status code as $code[0] == '4' ? 'client' : 'server'.

If you encounter a server error when executing a SendMessage command and retry it, you might end up with duplicate messages in the queue. I think you can be more cavalier with the retry decider if you make a few assumptions about the commands you're executing, but it's hard to say without clear context. Which command is leading to this error?

@oberman
Copy link
Author

oberman commented May 4, 2016

My backend processing can handle duplicate SQS messages. I believe everyone is in the same boat (needing to ensure SQS processing is idempotent) because of the "at least once" guarantee of ReceiveMessage.

I believe I've also seen this particular error in DynamoDB API calls, but I'd have to dig through old logging to be sure. The only two APIs I use at high volume are SQS and DynamoDB, so it makes sense that a rare event like this would pop out on those two for me.

I pretty much try to make sure all my AWS API calls either are either idempotent or at least don't cause chaos if applied multiple times due to these kinds of networking glitch possibilities. What I haven't been great at is knowing all of the places in configuration or integration of the SDK to properly retry requests to maximize the probability they are applied "at least once".

@oberman oberman closed this as completed May 5, 2016
@jeskew
Copy link
Contributor

jeskew commented May 5, 2016

@oberman The retry middleware's internal logic can be overridden when it's not appropriate for a given use case. The middleware's constructor takes a callable decider function, which by default takes the general case into account. If you know that your DynamoDB commands have a ConditionExpression that makes them effectively idempotent or that your backend can handle duplicate SQS messages, then you can provider a more opinionated decider function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants