Proposal: CURLE_RECV_ERROR (56) should retry #983

oberman · 2016-05-04T16:12:05Z

The root issue: Guzzle considers CURLE_RECV_ERROR to be RequestException instead of a ConnectException which causes the AWS SDK to throw an service exception (like SqsException). An example of Curl debugging when this happens to me:

* Hostname sqs.us-east-1.amazonaws.com was found in DNS cache
*   Trying 54.239.27.144...
* Connected to sqs.us-east-1.amazonaws.com (54.239.27.144) port 443 (#1)
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=queue.amazonaws.com,O="Amazon.com, Inc.",L=Seattle,ST=Washington,C=US
*       start date: Sep 25 00:00:00 2015 GMT
*       expire date: Dec 23 23:59:59 2016 GMT
*       common name: queue.amazonaws.com
*       issuer: CN=Symantec Class 3 Secure Server CA - G4,OU=Symantec Trust Network,O=Symantec Corporation,C=US
> POST XXX HTTP/1.1
Host: sqs.us-east-1.amazonaws.com
Content-Type: application/x-www-form-urlencoded
X-Amz-Date: 20160504T115214Z
Authorization: AWS4-HMAC-SHA256 Credential=[KEY]/20160504/us-east-1/sqs/aws4_request, SignedHeaders=host;x-amz-date, Signature=[SIGNATURE]
User-Agent: aws-sdk-php/3.15.2 GuzzleHttp/6.2.0 curl/7.40.0 PHP/5.5.33
Content-Length: 889

* upload completely sent off: 889 out of 889 bytes
* SSL read: errno -5961 (PR_CONNECT_RESET_ERROR)
* TCP connection reset by peer
* Closing connection 1

It appears that issue #767 was resolved by patching S3Client to retry CURLE_RECV_ERROR (2f985ba). I agree with the patch, but feel any API request with this error should be retried.

The text was updated successfully, but these errors were encountered:

GrahamCampbell · 2016-05-04T16:15:43Z

You can probably setup guzzle 6's retry middleware.

oberman · 2016-05-04T16:24:58Z

While researching my options (which including messing with retry middleware) I found #767. It sounds like this is a "rare but persistent" issue that happens to multiple SDK consumers, so I'm proposing the SDK handle it for everyone by default rather than by custom configuration.

jeskew · 2016-05-04T16:50:47Z

#767 was added because the idempotency characteristics of S3's API are well-defined and well-understood. That's generally not true of other services, where idempotency depends on factors unknown to the SDK when it's executing a command. Whether a ReceiveMessage command is idempotent depends on whether dead letter queues with automatic redrive policies have been configured; GetObject, however, is always idempotent.

Instances of ConnectException are also always idempotent, as Guzzle reserves that exception type for cases when it is unable to establish any connection with the remote server. CURL_RECV_ERROR, however, is used when an error occurs while receiving a response. If you encounter this error while putting a message in an SQS queue, you cannot know if your message was received and fully processed.

oberman · 2016-05-04T17:46:43Z

This is great information! I agree it's clearly not appropriate for the SDK to blindly retry CURL_RECV_ERROR.

I'm trying to figure out if there is a generic way of detecting which exceptions to retry if I know the call is idempotent. I see that AwsException provides getAwsErrorType(). I think the possible values are null, "client" and "server" (strings) and I'd want to retry "server". What about null? It's hard to puzzle through from the code but I think it's null on connection errors. I'm not sure what other conditions lead to null though...

jeskew · 2016-05-04T17:58:05Z

null would be encountered if there is no response from the server. type is calculated from the HTTP status code as $code[0] == '4' ? 'client' : 'server'.

If you encounter a server error when executing a SendMessage command and retry it, you might end up with duplicate messages in the queue. I think you can be more cavalier with the retry decider if you make a few assumptions about the commands you're executing, but it's hard to say without clear context. Which command is leading to this error?

oberman · 2016-05-04T19:31:32Z

My backend processing can handle duplicate SQS messages. I believe everyone is in the same boat (needing to ensure SQS processing is idempotent) because of the "at least once" guarantee of ReceiveMessage.

I believe I've also seen this particular error in DynamoDB API calls, but I'd have to dig through old logging to be sure. The only two APIs I use at high volume are SQS and DynamoDB, so it makes sense that a rare event like this would pop out on those two for me.

I pretty much try to make sure all my AWS API calls either are either idempotent or at least don't cause chaos if applied multiple times due to these kinds of networking glitch possibilities. What I haven't been great at is knowing all of the places in configuration or integration of the SDK to properly retry requests to maximize the probability they are applied "at least once".

jeskew · 2016-05-05T17:47:30Z

@oberman The retry middleware's internal logic can be overridden when it's not appropriate for a given use case. The middleware's constructor takes a callable decider function, which by default takes the general case into account. If you know that your DynamoDB commands have a ConditionExpression that makes them effectively idempotent or that your backend can handle duplicate SQS messages, then you can provider a more opinionated decider function.

oberman closed this as completed May 5, 2016

oberman mentioned this issue Jun 19, 2017

random cURL errors on HTTPS requests to SWF and DynamoDB #924

Closed

dagnir mentioned this issue Jun 7, 2018

Rapid concurrent S3 uploads sometimes produce a java.io.IOException: Server failed to send complete response aws/aws-sdk-java-v2#452

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: CURLE_RECV_ERROR (56) should retry #983

Proposal: CURLE_RECV_ERROR (56) should retry #983

oberman commented May 4, 2016 •

edited

Loading

GrahamCampbell commented May 4, 2016

oberman commented May 4, 2016

jeskew commented May 4, 2016

oberman commented May 4, 2016

jeskew commented May 4, 2016

oberman commented May 4, 2016

jeskew commented May 5, 2016

Proposal: CURLE_RECV_ERROR (56) should retry #983

Proposal: CURLE_RECV_ERROR (56) should retry #983

Comments

oberman commented May 4, 2016 • edited Loading

GrahamCampbell commented May 4, 2016

oberman commented May 4, 2016

jeskew commented May 4, 2016

oberman commented May 4, 2016

jeskew commented May 4, 2016

oberman commented May 4, 2016

jeskew commented May 5, 2016

oberman commented May 4, 2016 •

edited

Loading