-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random cURL errors on HTTPS requests to SWF and DynamoDB #924
Comments
It looks like the connection is being reset while OpenSSL is attempting to decrypt data, which can be caused by the client and server being unable to agree on a TLS protocol. Which version of OpenSSL are you using? OpenSSL emits error 104 (ECONNRESET) when the connection is reset by a peer host, and this error is happening while cURL is attempting to read data. cURL is therefore surfacing this as a read error (cURL error 56 - CURLE_RECV_ERROR). I'll look into why the SDK's behavior in response to this event would differ from v2 to v3. |
Hey @jeskew , thanks for the quick reply! What I have tried so far:
or by directly using the value of the constant:
It's a ZF2 application, so this code is in a aws.local.php file in my application configuration. I'll try to put it directly in the creation of the SwfClient. I tried, so far, OpenSSL 1.0.1f and 1.0.2g. Both with the same result. Thanks for your help! |
I also tried using the CA cert bundle from the curl website by specifying it in the php.ini with openssl.cafile, no luck :-/ |
a few additional infos:
and here's the output of a connection attempt to the SWF endpoint:
When I run instead:
I get a return code 0:
|
btw, the problem appears both on us-east-1 and us-west-2 |
Is the CA bundle at |
@jeskew yes, it is. |
I have currently running a few machines with an older version of our application, that uses the v2 of the SDK. To double check, that the exception doesn't appear on this version. I'll keep you updated! |
@jeskew The result is as expected. The same machines with the old application version (sdk v2.8.18, guzzle 3.7.1) don't produce the exceptions. |
v2 of the SDK uses Guzzle 3, which includes its own certificate authority bundle. v3 uses Guzzle 5/6, which rely on an external bundle. Can you set the |
@jeskew without specifying the the
and after specifying
I'm currently using PHP 5.6.18. I already tested this configuration, with the CA bundle specified in the php.ini. The exceptions keep appearing :-/ |
Is it possible that the processes that terminate following an OpenSSL error are either setting the v2 of the SDK uses Guzzle 3, which will use a vendored CA bundle by default, whereas v3 (via Guzzle 5/6) will use PHP, OS, and OpenSSL configuration as its default. You can override this on a client by specifying the verify HTTP option. For example: $dynamoClient = new \Aws\DynamoDb\DynamoDbClient([
'version' => 'latest',
'region' => 'us-west-2',
'http' => [
'verify' => '/tmp/cacert.pem',
],
]); |
We never handled the I'll give it a try with the Thank you, Jonathan, for your time! |
I specified the ca bundle now manually in the php code:
the region, domain, etc. is loaded via the ZF2 config files. |
One last question: are you connecting to AWS through any kind of network proxy? A similar error message was reported on the Docker repository: moby/moby#2011 They mention that this can happen to traffic from a Docker container, and one comment mentions that they saw a similar error when using a VPN. |
Hmm, we are using simple EC2 instances with Ubuntu in a VPC on AWS. The outgoing traffic passes through a NAT (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html). But I'm not sure if the connection to the AWS services like SWF remain in the internal network... I'll check that with a collegue and get back to you. Thanks! |
@jeskew In confirm that connections to SWF go through the NAT |
@jeskew The |
Since neither specifying an Could you try forcing TLS 1.0? In v3, you would do so like this: $ddb = new \Aws\DynamoDb\DynamoDbClient([
...
'http' => [
'curl' => [
CURLOPT_SSLVERSION => CURL_SSLVERSION_TLSv1_0,
]
]
]); OpenSSL is working on a fix but it has not yet been accepted and released. |
it's in place:
I'll keep you updated! Thanks! |
This sounds like a very serious lead.. It would explain the fact that the error is completely random. |
Got again an exception, but I'm trying now with:
I think I've seen that somewhere in the docs... |
|
Yes I still get them :-/ Sent from my Phone
|
@jeskew just scanned multiple times the SWF endpoint in us-east-1: https://www.ssllabs.com/ssltest/analyze.html?d=swf.us-east-1.amazonaws.com On different IPs, the result seems always the same. TLS 1.2 not supported, but TLS 1.0 is. |
I just tried to force TLS 1.2, and indeed, I get the following:
|
The root cause might be something else that's supported by one server and not another. I'm still unable to reproduce the issue, so if you can capture any more context about a failing request that would be very helpful. In the meantime, could you verify that this is related to sharing cURL handles? You can disable handle sharing by creating a custom Guzzle client like so: use Aws\Handler\GuzzleV6\GuzzleHandler;
use Aws\Swf\SwfClient;
use GuzzleHttp\Client;
use GuzzleHttp\Handler\CurlFactory;
use GuzzleHttp\Handler\CurlHandler;
use GuzzleHttp\Handler\CurlMultiHandler;
use GuzzleHttp\Handler\Proxy;
use GuzzleHttp\HandlerStack;
// Create a Guzzle client that will not share cURL handles
$guzzleClient = new Client([
'handler' => HandlerStack::create(Proxy::wrapSync(
new CurlMultiHandler(['handle_factory' => new CurlFactory(0)]),
new CurlHandler(['handle_factory' => new CurlFactory(0)])
))
]);
// Use the no-shared handle client to create an AWS client
$swfClient = new SwfClient([
'region' => 'us-east-1',
'version' => 'latest',
'http_handler' => new GuzzleHandler($guzzleClient),
]); |
I'm still getting the exceptions, but I'll check if I can get more verbose output from curl about this. Thanks! |
I just added:
I'll get back to you in a few hours! Thanks! |
in webservice it is showing error like as below |
@srinivasudadi9000 it seems to me that there is a package missing on your system? Did you install the php-curl package? |
Hi ppaulis please guide me , what is that package $fields = array $headers = array $ch = curl_init();
|
I wrote code for sending push notification for android and my local server 192 ..working fine but coming to aws - ec2 server showing fatal errror ..so how to resolve it ..If possible can you send the package related to curl |
how to install php-curl package in aws ppaulis can you send me any reference |
@srinivasudadi9000 Well it depends on your machine... Sometimes it's php-curl, or php5-curl, etc. The command to install it depends in your operating system. Because your questions are clearly not related to the original topic of this issue, I suggest that you ask it rather on a site like stackoverflow. Greetings, |
@ppaulis Is there any update on the issue? Are you all getting random curl errors? |
Yes and no. Internal (to the SDK) curl errors with retries are unavoidable. But if you mess around with the sdk settings you can reduce the amount times you run out of retries causing an error to bubble up. The settings are here: My goal was to maximize API call success while minimizing time spent waiting on a bad remote server (or servers) so I globally changed: I also added instrumentation to my code to peek into the SDK internal retries. See the graph of my last week. |
We are currently unable to reproduce the issue and are opening new discussions to explore any possible leads. As part of trying to get to the bottom of this issue, we would like to reaffirm what your broken states are.
Additionally, if you're still experiencing errors, can you update us with your current state on the following information:
We know this is a lot of information to ask for, but it's all in hopes of driving this to a resolution. Thank you! |
This is a long thread now and I've contributed in two ways: 1.) cURL error 56: Searching my logs, I see this exception is exceedingly rare, but still happens. By "exceedingly rare" I mean 4 times in 2017 out of what I guess are billions of calls to AWS services. My info: The most recent failure was on 2017-06-07 10:27:04 GMT-4. The stack trace: This error is a bummer because the SDK doesn't try to retry it. I think it might be the only error like that? 2.) The general issue of failures at the curl level. I'm able to control the time a call takes and the success rate of the call by configuring timeouts and number of retries (see my most previous post). In my log file processing I break internal retries into the following categories: capacity = DynamoDB specific and is generally zero. error = {5xx, PR_CONNECT_RESET_ERROR, PR_NOT_CONNECTED_ERROR}. Generally has been decreasing over time. A year ago I could have measured this category at the hour level (e.g. at least 1 event/hour), but recently I would have to expand to day (and even then I have days without an error). timeout = {timed out before SSL handshake, Connection timed out after, Resolving timed out after, Operation timed out}. This is steady and I see happen at the "10 minute level" (e.g. at least 1 event every 10 minutes). Again, as previously noted I have extremely agressive timeout settings and a generous number of retries. |
@oberman Thanks for all the updated information. cURL error 56 is retried, so you must be running over the specified number of retries. You mentioned that there are fewer of these errors now than before. What, if anything, has changed in that time? Is there any correlation between this cURL error surfacing and your outbound traffic? It's possible there is an issue with how many TLS connections are running simultaneously from your application. |
curl error 56: fewer errors: |
Re: cURL 56 - Apologies, I was looking at outdated information for that portion of the response. Are the cURL 56 errors you are receiving specific to an AWS service, or is it across multiple services? You've mentioned both SQS and DynamoDB in this thread, but with a decrease in DynamoDB issues as well. |
I think we're mixing things up by talking about different issues at the same time: curl 56 = I see no upward or downward trend. It's extremely rare (a small # per month). I see it in SQS and DDB, but that makes sense since those are my two highest volume APIs. I dug up the latest time it happened and will paste it at the bottom. AWS API errors = I've seen two plateaus. One before August 2016 and one after. It was higher before and lower now. I believe before was networking errors + 5xx errors. After is just 5xx errors. The last curl 56 error I saw:
|
@kstich sorry for the late reply! Here are already a few the infos you asked for:
I will try to provide you with the detailed error output asap. Thanks! |
I finally upgraded from v2 to v3 of the SDK a couple of weeks ago and have been seeing lots of these errors. For me they only occur on query requests to DynamoDB using
I've read this entire thread but still have no idea how to go about solving it. Has anyone had any luck getting to the bottom of this? If so, care to share? PHP: 7.1.10 Thanks! |
Also occurring with the following setup: |
Root CauseGuzzle v5 and Guzzle v6 do not trigger automatic retry of But why?At this point in the request/response lifecycle, there is no guarantee that retrying on that error code would be free of side-effects. Now what?If you are receiving this error, and believe the operation is safe to retry in your situation (no destructive or unintended side-effects, etc.), you may wish to retry the error in your own code. Middlewares SolutionCreate an additional |
After some further discussion based around the operations being handled, it is believed we can retry this at the SDK level since it's not safe to do at the Guzzle level. Support is being added for automatic retry of |
Thanks guys! That was not an easy one to find :-) |
This was included in the 3.52.0 release. |
OK, with aws.phar 3.54.3 I'm having this performance using it on Amazon S3. |
Hello!
we recently switched on our EC2 instances from the v2 to v3 (most recent version) of the PHP SDK. We use SWF, DynamoDB and S3. Since we switched to the v3, we are facing cURL errors that appear completely randomly it seems:
The application has thrown an exception!
Aws\Swf\Exception\SwfException
Error executing "PollForActivityTask" on "https://swf.us-east-1.amazonaws.com"; AWS HTTP error: cURL error 56: SSL read: error:00000000:lib(0):func(0):reason(0), errno 104 (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
Sometimes the exception appears twice per hour, sometimes once in three hours... there's no scheme.
We already switched back to an older AMI and updated cURL from 7.35.0 (there is a problem with chunked upload in the 7.35) to 7.36.0 but nothing helps. We are polling SWF for activity tasks with long polling enabled, so there shouldn't be too many requests.
I googled this of course before opening an issue, but the only topics I found either date from 2012-2013 or were related to a broken load balancer on AWS. And that would probably be too much of a coincidence in our case...
Does anyone know about this problem..? I can't even say if it's a problem in the SDK or something else. So I'm grateful for every hint you can give me!
Thanks a lot!
Pascal
The text was updated successfully, but these errors were encountered: