-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rapid S3 downloads causes failure #1122
Comments
Thanks for reporting! We are aware of the issue and it is caused by the fact that service can close connection after certain time and we are actively working on a fix. |
Thanks for the quick response @zoewangg . I'm actively looking for workarounds because this is causing issues in customer facing code for us. When you say, "service can close connection after certain time" do you mean this is a server-side S3 bug? I was considering replacing our use of the AWS SDK with just a plain HTTP client but if this issue is server side that obviously won't help. Is there anything we can do to work around this? I tried a few obvious things like putting short delays between file downloads but that didn't help. Any short-term workaround advice would be greatly appreciated. Also, a very rough timeline would also be great. I know you can't promise a date, but if you're working on this now and think you'll have a fix in days it doesn't make sense for me to spend a lot of time on workarounds. OTOH, if nobody is going to start working on this bug for a few weeks I need to do something. |
No, it's a bug in the SDK. Would v2 sync clients work for you? I think this exception would get retried and should succeed in the next attempt, is it not the case for you? |
We could spawn threads and use sync clients. A bit less efficient but would be OK.
What we see is that with retries enabled this bug is less common but still appears quite often (e.g. every 250 files instead of every 100). Even if retries did work we can't use them though. Our use case is to create a zip file containing all the data from all of these photos in a streaming way without holding it all in RAM so retries aren't an option for us; they'd cause a corrupt file because the zip archive would contain part of the file (the part that failed part way through) and then on retry that part of the file would be repeated. |
Thanks for the information. Please switch to v2 sync clients if that's easy workaround for you. We don't have a clear date for the fix but fixing it is one of our top priorities right now. |
Update: unfortunately, it will take longer time than expected to fix this issue. UPDATE: Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it. |
We are also experiencing this issue with
Our use case is a job downloading around 20 files per execution, using the S3AsyncClient. We get this issue around every 3rd to 5th job execution. Thanks for the temporary workaround, @zoewangg |
Any update. We are hitting this issue too. |
We tried implementing the workaround suggested. How do we create the override URI for different regions? |
Update: we reached out to the service team and we are still working on finding the right solution for it. Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it. @vvellanki You can find all S3 endpoints in https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region |
The options I see are:
How about if we restrict the number of concurrent calls to a fixed number? Is that a workaround? |
Summary Reducing connectionMaxIdleTime should reduce the chances of occurrences of this issue. @vvellanki Can you try to update
|
We are experiencing a similar issue, but with DynamoDB:
The application is a worker, consuming from kinesis, using KCL, and reading and writing to dynamo.
The stacktrace when using aws-sdk-java-v2
Are these issues related? I tried to use I think we've experienced this same behavior before, with s3, but we change to use s3 from aws v1, and the issue has gone away. Now, we are experiencing the issue with dynamo. Environment:
|
I've experienced this issue as well. The change proposed by @zoewangg does seem to mitigate this issue. |
This issue seems fairly bad. The following code fails after a few iterations: try (S3AsyncClient client = S3AsyncClient.create()) {
client.putObject(r -> r.bucket("tmp.millems").key("test"), AsyncRequestBody.fromBytes("".getBytes())).join();
for (int i = 0; i < 1000; ++i) {
System.out.println("Iteration " + i);
Path outputPath = Paths.get("/tmp/foo");
Files.deleteIfExists(outputPath);
client.getObject(r -> r.bucket("tmp.millems").key("test"), AsyncResponseTransformer.toFile(outputPath)).join();
}
} SDK retries don't seem to help fix the problem. |
Failure logs:
|
It looks like the SDK might not properly be finalizing the response, because the debug log shows content-length: 16 as well as 16 bytes being returned. Update 1: Update 2: On a good call where LastHttpContent triggers the onComplete, the onNext is invoked in this way:
On a bad call where LastHttpContent does not trigger the onComplete, the onNext is invoked in this way:
Update 3: It looks like we're dropping downstream publisher demand in the checksum validator. |
Created #1210, which will help with some of these issues with |
An improvement for this issue will go out with tomorrow's release. Please test it out and see if it fixes the problem for you. |
A change has gone out. Can you please update to the latest client version and see if you're still seeing the problem? |
Tried with the latest build 2.5.29. Still seeing the same issue. Caused By (java.io.IOException) Server failed to send complete response |
@vvellanki Thanks for checking. Were you able to update connectionMaxIdleTime to 5 seconds to see if it helps? We want to see if making this the default will fix the problem for customers with access patterns like yours. |
@millems I still face the issue reducing the connectionMaxIdleTime to 5 seconds and the latest version 2.5.29 |
@Kumarnitin2001 Thanks for confirmation. |
@millems Is there any news regarding this issue? We would rather not restart our app once a day (if possible) |
We haven't been able to reproduce this specific issue with |
@zoewangg I tried upgrading to 2.7.5, but now I'm getting Edit: this only occurs sometimes, but didn't happen that often on 2.5.29 trace
|
@Eyal-Shalev |
Guys, I'm still experiencing the same issue as the other devs. In my case it affects my Kinesis producer which is ingesting a couple of thousands of messages per second. It worth mention that I'm using I'm using JDK 11 with the following libraries: properties:
# VERSIONS
version.kotlin: 1.3.40
version.jackson: 2.9.9
version.aws: 2.7.15
version.aws-kinesis: 2.2.1 Here is the stack trace.
|
We are also facing the same issue with the async client version 2.7.13. Our use case involves making a few calls to s3 to serve UI requests, and we wanted to use the async client. We are getting this error quite frequently, around the same rate as @dazito mentioned, and is becoming difficult for us to constantly monitor and restart the service. @zoewangg In case if there is not an immediate fix planned in an upcoming version, we might have to do try out s3 sync client, and in case it works, performance impact analysis. Do you have any suggestions on what should be the way forward? |
@miere The suggested setting @ybhatnagar To be clear, are you having the issue with s3 sync client or async client? Have you tried to set |
@zoewangg It seems like our issue with S3 connection was indeed resolved. |
@zoewangg We haven't disabled the retry, is it enabled by default? |
@Eyal-Shalev Glad to hear the issue was resolved for s3 client. Retry is enabled be default and the default max number of retries is 3. |
Closing the issue due to no recent response. Feel free to open a new issue if you continue to see the error. |
I'm facing this issue with version 2.5.61 and I find this discussion. |
We just merged the change to updated the default Before that, you'd still need to set connectionMaxIdleTime(Duration.ofSeconds(5)) |
I'm seeing this issue in the latest version of the SDK. I'm reading from SQS via Alpakka using the async client. I tried setting the
|
Hi @jaylynstoesz, I see you're using SQS so can you please open a new github issue, with a sample code we can use to repro? Setting |
For SNS/SQS I have created this ticket |
@millems @debora-ito @zoewangg Hi.
Line 80 in 51d067e
[S3 Configuration]
|
Hi @pkgonan. For error messages that include |
I have code using the Java SDK to download a few hundred photos from an S3 bucket. The first bunch work but things start to fail around photo number 100. It does not always fail on the same file. It seems very similar to #452 though that issue is closed. Note that I'm using
S3AsyncClient
but downloading the file sequentially, never downloading more than one concurrently.The photos are about 500KB each.
Expected Behavior
I should be able to download all the objects in my bucket
Current Behavior
Am able to download many files but after about 100 requests star to fail. The exception is:
With full debug logs enabled there's a ton of output but the response from the Netty network logs from right before things crash is:
Steps to Reproduce (for bugs)
Here is a slightly modified version of the Kotlin code that produces the above (modified to remove company sensitive stuff):
Context
We provide photos to our customers and are trying to provide a "download all" function. To do this we need to stream the data from S3 through some code that incrementally adds it to a zip file which is provided to the customer. Since there's so much data we can't hold it all in RAM so we need to do this async-streaming processing. We have our own
AsyncResponseTransformer
which does this but wanted to provide an example with as little of our code as possible.AsyncResponseTransformer.toFile
and our code experience the same error.Your Environment
The text was updated successfully, but these errors were encountered: