Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rapid S3 downloads causes failure #1122

Closed
oliverdain opened this issue Mar 8, 2019 · 50 comments
Closed

Rapid S3 downloads causes failure #1122

oliverdain opened this issue Mar 8, 2019 · 50 comments
Labels
bug This issue is a bug. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Comments

@oliverdain
Copy link

I have code using the Java SDK to download a few hundred photos from an S3 bucket. The first bunch work but things start to fail around photo number 100. It does not always fail on the same file. It seems very similar to #452 though that issue is closed. Note that I'm using S3AsyncClient but downloading the file sequentially, never downloading more than one concurrently.

The photos are about 500KB each.

Expected Behavior

I should be able to download all the objects in my bucket

Current Behavior

Am able to download many files but after about 100 requests star to fail. The exception is:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:50)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:167)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:118)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:103)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:241)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$3(ResponseHandler.java:398)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:179)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:398)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
	at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
	at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
	at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:117)
	... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
	... 54 common frames omitted

With full debug logs enabled there's a ton of output but the response from the Netty network logs from right before things crash is:

11:16:18.311 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 - R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] READ: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 254, cap: 254/254, unwrapped: PooledUnsafeDirectByteBuf(ridx: 8446, widx: 8446, cap: 8475)), decoderResult: success), 254B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 5d 52 e1 13 91 3f b5 10 51 7b 34 f6 79 a0 39 aa |]R...?..Q{4.y.9.|
|00000010| 3a 4d fc ce 3c 68 3b 15 b5 ad 02 63 d4 f7 a0 78 |:M..<h;....c...x|
|00000020| 92 29 39 d5 39 01 a9 44 b9 c6 33 2a 8d 7e 91 f3 |.)9.9..D..3*.~..|
|00000030| 28 ea 71 58 a6 09 25 58 2f a8 4e 94 f1 24 02 65 |(.qX..%X/.N..$.e|
|00000040| a7 b5 48 11 7c 6a 95 05 d2 a8 b0 93 a1 02 d8 57 |..H.|j.........W|
|00000050| 11 04 21 ad 32 b4 2d 94 16 99 af 0e f4 00 41 2a |..!.2.-.......A*|
|00000060| 80 04 55 04 32 94 41 38 93 40 01 11 25 02 94 89 |..U.2.A8.@..%...|
|00000070| 1a 39 2a 2d 84 b0 04 12 d7 2a cd 22 60 a7 34 27 |.9*-.....*."`.4'|
|00000080| 40 94 0f 0d 0d 53 0d 85 38 5d bb 24 ea af 94 32 |@....S..8].$...2|
|00000090| 84 80 24 ad 3a 63 0c e8 f8 77 85 44 8a a0 14 1a |..$.:c...w.D....|
|000000a0| 02 74 27 51 ab e6 25 02 c6 4e a8 ed 4e 64 68 4a |.t'Q..%..N..NdhJ|
|000000b0| 4e 84 09 cf 8c 7a 0e f4 65 4c f8 2b 00 28 14 f1 |N....z..eL.+.(..|
|000000c0| 98 f0 54 0a a2 a0 24 9b 48 42 2e ed 20 ad ae 53 |..T...$.HB.. ..S|
|000000d0| a8 f0 12 a9 80 9a 05 11 a3 53 2a 72 4f 3d 70 ae |.........S*rO=p.|
|000000e0| 07 9a 61 20 27 e7 8c 27 27 2d 6a a3 ff d9 f2 d3 |..a '..''-j.....|
|000000f0| 22 9e 33 37 d2 2e 06 03 e0 06 a6 fc 72 02       |".37........r.  |
+--------+-------------------------------------------------+----------------+
11:16:18.312 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] READ COMPLETE
11:16:18.312 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] INACTIVE
11:16:18.316 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] CLOSE
11:16:18.316 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] UNREGISTERED

Steps to Reproduce (for bugs)

Here is a slightly modified version of the Kotlin code that produces the above (modified to remove company sensitive stuff):

object Main {
    val log = getLogger { }

    val s3AsyncClient = S3AsyncClient.builder()
        .credentialsProvider(DefaultCredentialsProvider.create())
        .region(Region.US_WEST_2)
        // retries disable here because we can't retry in our situation - the real use case is to stream
        // data from a server to a customer. Note that you get the same failures even with retries -- it
        // just takes longer
        .overrideConfiguration { config ->
            config.retryPolicy(RetryPolicy.none()).build()
        }
        .build()

    val daBucket = "com-revl-revlx"

    fun doDownload(key: String) {
        val outFile = Paths.get("/tmp/experiment/").resolve(key)
        outFile.parent.toFile().mkdirs()
        if (outFile.toFile().exists()) {
            outFile.toFile().delete()
        }
        log.info("Downloading via toFile: {}", key)
        val request = GetObjectRequest.builder()
            .bucket(daBucket)
            .key(key)
            .build()

        val future = s3AsyncClient.getObject(request, AsyncResponseTransformer.toFile(outFile))

        try {
            future.join()
            log.info("Download of {} succeeded", key)
        } catch (t: Throwable) {
            log.error("Download of {} failed with:", key, t)
            throw t
        }
    }

    @JvmStatic
    fun main(argv: Array<String>) {
        ThreadUtils.setLoggingDefaultExceptionHandler()
        val keys = (1..240).toList()

        var numDone = 1
        for (i in keys) {
            log.info("Process file number {}", numDone)
            ++numDone
            val key = String.format("path/to/photo_%05d.jpg", i)
            doDownload(key)
        }
    }
}

Context

We provide photos to our customers and are trying to provide a "download all" function. To do this we need to stream the data from S3 through some code that incrementally adds it to a zip file which is provided to the customer. Since there's so much data we can't hold it all in RAM so we need to do this async-streaming processing. We have our own AsyncResponseTransformer which does this but wanted to provide an example with as little of our code as possible. AsyncResponseTransformer.toFile and our code experience the same error.

Your Environment

  • AWS Java SDK version used: 2.5.4
  • JDK version used: Kotlin 1.2.71 running on Java 8 JVM
  • Operating System and version: have reproduced on Debian Linux Stretch and OSX
@zoewangg
Copy link
Contributor

zoewangg commented Mar 8, 2019

Thanks for reporting! We are aware of the issue and it is caused by the fact that service can close connection after certain time and we are actively working on a fix.

@zoewangg zoewangg added the bug This issue is a bug. label Mar 8, 2019
@oliverdain
Copy link
Author

oliverdain commented Mar 8, 2019

Thanks for the quick response @zoewangg . I'm actively looking for workarounds because this is causing issues in customer facing code for us. When you say, "service can close connection after certain time" do you mean this is a server-side S3 bug? I was considering replacing our use of the AWS SDK with just a plain HTTP client but if this issue is server side that obviously won't help.

Is there anything we can do to work around this? I tried a few obvious things like putting short delays between file downloads but that didn't help.

Any short-term workaround advice would be greatly appreciated.

Also, a very rough timeline would also be great. I know you can't promise a date, but if you're working on this now and think you'll have a fix in days it doesn't make sense for me to spend a lot of time on workarounds. OTOH, if nobody is going to start working on this bug for a few weeks I need to do something.

@zoewangg
Copy link
Contributor

zoewangg commented Mar 8, 2019

No, it's a bug in the SDK. Would v2 sync clients work for you?

I think this exception would get retried and should succeed in the next attempt, is it not the case for you?

@oliverdain
Copy link
Author

Would v2 sync clients work for you?

We could spawn threads and use sync clients. A bit less efficient but would be OK.

I think this exception would get retried and should succeed in the next attempt, is it not the case for you?

What we see is that with retries enabled this bug is less common but still appears quite often (e.g. every 250 files instead of every 100).

Even if retries did work we can't use them though. Our use case is to create a zip file containing all the data from all of these photos in a streaming way without holding it all in RAM so retries aren't an option for us; they'd cause a corrupt file because the zip archive would contain part of the file (the part that failed part way through) and then on retry that part of the file would be repeated.

@zoewangg
Copy link
Contributor

zoewangg commented Mar 8, 2019

Thanks for the information. Please switch to v2 sync clients if that's easy workaround for you. We don't have a clear date for the fix but fixing it is one of our top priorities right now.

@zoewangg
Copy link
Contributor

zoewangg commented Mar 13, 2019

Update: unfortunately, it will take longer time than expected to fix this issue.

UPDATE: Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it.

@dazito
Copy link

dazito commented Mar 13, 2019

We are also experiencing this issue with

 <dependency>
     <groupId>software.amazon.awssdk</groupId>
     <artifactId>s3</artifactId>
     <version>2.5.6</version>
 </dependency>

Our use case is a job downloading around 20 files per execution, using the S3AsyncClient. We get this issue around every 3rd to 5th job execution.

Thanks for the temporary workaround, @zoewangg

@vvellanki
Copy link

Any update. We are hitting this issue too.

@vvellanki
Copy link

We tried implementing the workaround suggested. How do we create the override URI for different regions?

@zoewangg
Copy link
Contributor

Update: we reached out to the service team and we are still working on finding the right solution for it.

Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it.

@vvellanki You can find all S3 endpoints in https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

@vvellanki
Copy link

The options I see are:

  1. Use http instead of https - does not work for us
  2. Use sync v2 APIs instead of async - in our case, the change doesn't appear trivial due to dependencies. We hit some issues due to using older httpclient jar
  3. Wait for a fix for this issue

How about if we restrict the number of concurrent calls to a fixed number? Is that a workaround?

@zoewangg
Copy link
Contributor

zoewangg commented Apr 1, 2019

Summary
The root cause is that service might close idle connections for a certain amount of time and there might be a race condition where a soon-stale connection gets picked up by SDK and the connection gets closed in the same time SDK sends out the request.

Reducing connectionMaxIdleTime should reduce the chances of occurrences of this issue.

@vvellanki Can you try to update connectionMaxIdleTime to 5 seconds to see if it helps?

        s3Async = S3AsyncClient.builder()
                               .httpClientBuilder(NettyNioAsyncHttpClient.builder()
                                                  .connectionMaxIdleTime(Duration.ofSeconds(5)))
                               .build();

@fhsgoncalves
Copy link

fhsgoncalves commented Apr 9, 2019

We are experiencing a similar issue, but with DynamoDB:

  • we execute concurrently about 1000 writes & reads /second to a dynamo table, per instance of the application.

The application is a worker, consuming from kinesis, using KCL, and reading and writing to dynamo.

  • Note: we tried with KCL v1 and v2, and we don't know why, but the bug happens more quickly when we use the KCL v2.
    The application is written in kotlin 1.3.21 and uses coroutines 1.1.1. We adapt the reactive-streams from aws-sdk-java-v2 using the default kotlin-coroutines adapter.

The stacktrace when using aws-sdk-java-v2 2.5.11:
Note: when we tried the version aws sdk 2.5.25, the exception is not thrown, but the code is stuck forever trying to read/write from dynamo (we call .await() in a GlobalScope.async {}, waiting for dynamo complete).

software.amazon.awssdk.core.exception.SdkClientException: null
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
    at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:123)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:105)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:241)
    at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:237)
    at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$writeRequest$6(NettyRequestExecutor.java:191)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:991)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:873)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1365)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
    at io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:113)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
    at io.netty.handler.ssl.SslHandler.finishWrap(SslHandler.java:894)
    at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:880)
    at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:797)
    at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:778)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.flush(CombinedChannelDuplexHandler.java:533)
    at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
    at io.netty.channel.CombinedChannelDuplexHandler.flush(CombinedChannelDuplexHandler.java:358)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.handler.logging.LoggingHandler.flush(LoggingHandler.java:265)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
    at com.typesafe.netty.http.HttpStreamsHandler.completeBody(HttpStreamsHandler.java:298)
    at com.typesafe.netty.http.HttpStreamsHandler.access$400(HttpStreamsHandler.java:14)
    at com.typesafe.netty.http.HttpStreamsHandler$3$1.run(HttpStreamsHandler.java:276)
    at com.typesafe.netty.http.HttpStreamsHandler.executeInEventLoop(HttpStreamsHandler.java:342)
    at com.typesafe.netty.http.HttpStreamsHandler.access$300(HttpStreamsHandler.java:14)
    at com.typesafe.netty.http.HttpStreamsHandler$3.complete(HttpStreamsHandler.java:273)
    at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:244)
    at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:241)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
    at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndCompleteAll(AbstractCoalescingBufferQueue.java:340)
    at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndFailAll(AbstractCoalescingBufferQueue.java:207)
    at io.netty.handler.ssl.SslHandler.releaseAndFailAll(SslHandler.java:1591)
    at io.netty.handler.ssl.SslHandler.access$900(SslHandler.java:166)
    at io.netty.handler.ssl.SslHandler$5.run(SslHandler.java:1785)
    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at java.lang.Thread.run(Thread.java:748)
java.nio.channels.ClosedChannelException: null
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)

Are these issues related?
I'm thinking in use this lib: https://github.com/bradynpoulsen/aws-sdk-kotlin-coroutines/, that is a wrapper for the aws-sdk-java v1, but with support to coroutines out of the box. It should be less efficient, but it may work.

I tried to use http as the protocol in dynamo endpoint and also set connectionMaxIdleTime to 5 seconds, without success. The problem happens after some minutes that the application is running when using with KCL v2, and after a greater period when using KCL v1.

I think we've experienced this same behavior before, with s3, but we change to use s3 from aws v1, and the issue has gone away. Now, we are experiencing the issue with dynamo.

Environment:

  • Java 8, kotlin 1.3.21, coroutines 1.1.1.
  • Linux debian (from openjdk:8u162-jdk-slim docker image)
  • aws-sdk-java-v2: 2.5.11 & 2.5.25 (the later without the exception, but the code is stuck forever).

@Eyal-Shalev
Copy link

I've experienced this issue as well.
We are downloading the same list of files (about 60 small json files) every 5 seconds (to check for changes).

The change proposed by @zoewangg does seem to mitigate this issue.

@millems
Copy link
Contributor

millems commented Apr 16, 2019

This issue seems fairly bad. The following code fails after a few iterations:

try (S3AsyncClient client = S3AsyncClient.create()) {
    client.putObject(r -> r.bucket("tmp.millems").key("test"), AsyncRequestBody.fromBytes("".getBytes())).join();

    for (int i = 0; i < 1000; ++i) {
        System.out.println("Iteration " + i);
        Path outputPath = Paths.get("/tmp/foo");
        Files.deleteIfExists(outputPath);
        client.getObject(r -> r.bucket("tmp.millems").key("test"), AsyncResponseTransformer.toFile(outputPath)).join();
    }
}

SDK retries don't seem to help fix the problem.

@millems
Copy link
Contributor

millems commented Apr 16, 2019

Failure logs:

2019-04-16 16:34:21,476 [aws-java-sdk-NettyEventLoop-1-4] DEBUG software.amazon.awssdk.request - Retryable error detected, will retry in 311ms, attempt number 3
2019-04-16 16:34:21,477 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0xf04bce0b, L:/10.94.46.143:56881 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] CLOSE
2019-04-16 16:34:21,477 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0xf04bce0b, L:/10.94.46.143:56881 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] UNREGISTERED
2019-04-16 16:34:21,788 [sdk-ScheduledExecutor-2-1] DEBUG software.amazon.awssdk.request - Retrying Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=s3.eu-west-1.amazonaws.com, port=443, encodedPath=/tmp.millems/test, headers=[amz-sdk-invocation-id, User-Agent, x-amz-te], queryParameters=[])
2019-04-16 16:34:21,791 [sdk-ScheduledExecutor-2-1] DEBUG software.amazon.awssdk.auth.signer.Aws4Signer - AWS4 String to sign: AWS4-HMAC-SHA256
20190416T233421Z
20190416/eu-west-1/s3/aws4_request
7f129f3fe741eebd1b6aba5a36006198c2cb6275da89198fd117ce9730c2eeae
2019-04-16 16:34:21,797 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979] REGISTERED
2019-04-16 16:34:21,798 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979] CONNECT: s3.eu-west-1.amazonaws.com/52.218.24.211:443
2019-04-16 16:34:21,965 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] ACTIVE
2019-04-16 16:34:21,982 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE: software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest(DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET https://s3.eu-west-1.amazonaws.com/tmp.millems/test HTTP/1.1
amz-sdk-invocation-id: 4126936f-2cb5-a0e6-ae52-e5f1e122ff4a
amz-sdk-retry: 3/311/485
Authorization: xxx
Host: s3.eu-west-1.amazonaws.com
User-Agent: aws-sdk-java/2.5.24-SNAPSHOT Mac_OS_X/10.12.6 Java_HotSpot_TM__64-Bit_Server_VM/25.202-b08 Java/1.8.0_202 vendor/Oracle_Corporation
x-amz-content-sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20190416T233421Z
x-amz-te: append-md5)
2019-04-16 16:34:21,982 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE, DefaultHttpContent(data: EmptyByteBufBE, decoderResult: success), 0B
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:22,154 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,168 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,174 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,184 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,192 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,359 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.ssl.SslHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] HANDSHAKEN: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
2019-04-16 16:34:22,360 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] USER_EVENT: SslHandshakeCompletionEvent(SUCCESS)
2019-04-16 16:34:22,364 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE, EmptyLastHttpContent, 0B
2019-04-16 16:34:22,367 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:22,369 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,599 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ: DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
x-amz-id-2: SkrTAHIrukTmY1Pm6f6fMJqTzAGJrtfbl6khnC/LWoPjd4lyxQlhjtzsOTeEpqZ130INC2fp9PE=
x-amz-request-id: FE32414B3EFE3E86
Date: Tue, 16 Apr 2019 23:34:23 GMT
Last-Modified: Tue, 16 Apr 2019 23:34:02 GMT
ETag: "d41d8cd98f00b204e9800998ecf8427e"
x-amz-transfer-encoding: append-md5
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 16
Server: AmazonS3
2019-04-16 16:34:22,600 [aws-java-sdk-NettyEventLoop-1-2] DEBUG software.amazon.awssdk.request - Received successful response: 200
2019-04-16 16:34:22,601 [aws-java-sdk-NettyEventLoop-1-2] DEBUG software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain - Interceptor 'software.amazon.awssdk.services.s3.internal.handlers.EnableTrailingChecksumInterceptor@74294adb' modified the message with its modifyResponse method.
2019-04-16 16:34:22,602 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 16, cap: 16/16, unwrapped: PooledUnsafeDirectByteBuf(ridx: 16, widx: 16, cap: 45)), decoderResult: success), 16B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| d4 1d 8c d9 8f 00 b2 04 e9 80 09 98 ec f8 42 7e |..............B~|
+--------+-------------------------------------------------+----------------+
2019-04-16 16:34:22,602 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:28,643 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] USER_EVENT: SslCloseCompletionEvent(SUCCESS)
2019-04-16 16:34:28,644 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:29,065 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:29,067 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] INACTIVE
2019-04-16 16:34:29,184 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.buffer.PoolThreadCache - Freed 19 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-4
2019-04-16 16:34:29,184 [aws-java-sdk-NettyEventLoop-1-3] DEBUG io.netty.buffer.PoolThreadCache - Freed 22 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-3
2019-04-16 16:34:29,192 [aws-java-sdk-NettyEventLoop-1-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 18 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-1
2019-04-16 16:34:29,193 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] CLOSE
2019-04-16 16:34:29,194 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] UNREGISTERED
2019-04-16 16:34:31,607 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 19 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-2

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException

	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException
...

@millems
Copy link
Contributor

millems commented Apr 16, 2019

It looks like the SDK might not properly be finalizing the response, because the debug log shows content-length: 16 as well as 16 bytes being returned.

Update 1: ResponseHandler > PublisherAdapter > subscribe() > new Subscriber > onComplete() is not being invoked in the case that an exception is being thrown. It should be invoked. We can tell it's the end of the stream based on the fact that ResponseHandler > PublisherAdapter > subscribe() > new Subscriber > onNext(HttpContent) was invoked with a LastHttpContent.

Update 2: On a good call where LastHttpContent triggers the onComplete, the onNext is invoked in this way:

java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1336)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:268)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:222)
	at com.typesafe.netty.HandlerPublisher.publishMessage(HandlerPublisher.java:367)
	at com.typesafe.netty.HandlerPublisher.channelRead(HandlerPublisher.java:345)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at com.typesafe.netty.http.HttpStreamsHandler.handleReadHttpContent(HttpStreamsHandler.java:184)
	at com.typesafe.netty.http.HttpStreamsHandler.channelRead(HttpStreamsHandler.java:165)
	at com.typesafe.netty.http.HttpStreamsClientHandler.channelRead(HttpStreamsClientHandler.java:148)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
...

On a bad call where LastHttpContent does not trigger the onComplete, the onNext is invoked in this way:

java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1336)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:268)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:222)
	at com.typesafe.netty.HandlerPublisher.publishMessage(HandlerPublisher.java:367)
	at com.typesafe.netty.HandlerPublisher.flushBuffer(HandlerPublisher.java:304)
	at com.typesafe.netty.HandlerPublisher.receivedDemand(HandlerPublisher.java:258)
	at com.typesafe.netty.HandlerPublisher.access$200(HandlerPublisher.java:41)
	at com.typesafe.netty.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:452)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
	at java.lang.Thread.run(Thread.java:748)

Update 3: It looks like we're dropping downstream publisher demand in the checksum validator.

@millems
Copy link
Contributor

millems commented Apr 17, 2019

Created #1210, which will help with some of these issues with S3AsyncClient.getObject.

@millems
Copy link
Contributor

millems commented Apr 17, 2019

An improvement for this issue will go out with tomorrow's release. Please test it out and see if it fixes the problem for you.

@millems
Copy link
Contributor

millems commented Apr 19, 2019

A change has gone out. Can you please update to the latest client version and see if you're still seeing the problem?

@vvellanki
Copy link

Tried with the latest build 2.5.29. Still seeing the same issue.

Caused By (java.io.IOException) Server failed to send complete response
software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted():398
software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive():148
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.handler.logging.LoggingHandler.channelInactive():167
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75
io.netty.handler.timeout.IdleStateHandler.channelInactive():277
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive():420
io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed():377
io.netty.handler.codec.ByteToMessageDecoder.channelInactive():342
io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive():282
io.netty.channel.CombinedChannelDuplexHandler.channelInactive():223
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed():377
io.netty.handler.codec.ByteToMessageDecoder.channelInactive():342
io.netty.handler.ssl.SslHandler.channelInactive():1010
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75
io.netty.handler.timeout.IdleStateHandler.channelInactive():277
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224
io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive():1409
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231
io.netty.channel.DefaultChannelPipeline.fireChannelInactive():927
io.netty.channel.AbstractChannel$AbstractUnsafe$8.run():822
io.netty.util.concurrent.AbstractEventExecutor.safeExecute():163
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks():404
io.netty.channel.nio.NioEventLoop.run():463
io.netty.util.concurrent.SingleThreadEventExecutor$5.run():886
java.lang.Thread.run():748

@millems
Copy link
Contributor

millems commented Apr 22, 2019

@vvellanki Thanks for checking. Were you able to update connectionMaxIdleTime to 5 seconds to see if it helps?

We want to see if making this the default will fix the problem for customers with access patterns like yours.

@Kumarnitin2001
Copy link

@millems I still face the issue reducing the connectionMaxIdleTime to 5 seconds and the latest version 2.5.29
S3AsyncClient.builder().httpClientBuilder(NettyNioAsyncHttpClient.builder() .writeTimeout(Duration.ofSeconds(100l)) .readTimeout(Duration.ofSeconds(100l)) .connectionMaxIdleTime(Duration.ofSeconds(5))).build()

@millems
Copy link
Contributor

millems commented Apr 26, 2019

@Kumarnitin2001 Thanks for confirmation.

@Eyal-Shalev
Copy link

Eyal-Shalev commented May 27, 2019

@millems Is there any news regarding this issue?

We would rather not restart our app once a day (if possible)

@millems
Copy link
Contributor

millems commented May 28, 2019

We haven't been able to reproduce this specific issue with connectionMaxIdleTime(Duration.ofSeconds(5)) set. That's fixed it for all of our scenarios. We're still trying to reproduce it. If you have a repro case, that would be fantastic.

@Eyal-Shalev
Copy link

Eyal-Shalev commented Jul 14, 2019

@zoewangg I tried upgrading to 2.7.5, but now I'm getting Server failed to send complete response on ElasticLoadBalancingAsyncClient.registerInstancesWithLoadBalancer

Edit: this only occurs sometimes, but didn't happen that often on 2.5.29

trace

  [software.amazon.awssdk.utils.CompletableFutureUtils errorAsCompletionException "CompletableFutureUtils.java" 61]
  [software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl build "SdkClientException.java" 97]
  [software.amazon.awssdk.http.nio.netty.internal.ResponseHandler notifyIfResponseNotCompleted "ResponseHandler.java" 396]
  [software.amazon.awssdk.http.nio.netty.internal.ResponseHandler channelInactive "ResponseHandler.java" 148]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.handler.logging.LoggingHandler channelInactive "LoggingHandler.java" 167]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.handler.timeout.IdleStateHandler channelInactive "IdleStateHandler.java" 277]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext fireChannelInactive "CombinedChannelDuplexHandler.java" 420]
  [io.netty.handler.codec.ByteToMessageDecoder channelInputClosed "ByteToMessageDecoder.java" 390]
  [io.netty.handler.codec.ByteToMessageDecoder channelInactive "ByteToMessageDecoder.java" 355]
  [io.netty.handler.codec.http.HttpClientCodec$Decoder channelInactive "HttpClientCodec.java" 282]
  [io.netty.channel.CombinedChannelDuplexHandler channelInactive "CombinedChannelDuplexHandler.java" 223]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.handler.codec.ByteToMessageDecoder channelInputClosed "ByteToMessageDecoder.java" 390]
  [io.netty.handler.codec.ByteToMessageDecoder channelInactive "ByteToMessageDecoder.java" 355]
  [io.netty.handler.ssl.SslHandler channelInactive "SslHandler.java" 1054]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.handler.timeout.IdleStateHandler channelInactive "IdleStateHandler.java" 277]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.DefaultChannelPipeline$HeadContext channelInactive "DefaultChannelPipeline.java" 1403]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.DefaultChannelPipeline fireChannelInactive "DefaultChannelPipeline.java" 912]
  [io.netty.channel.AbstractChannel$AbstractUnsafe$8 run "AbstractChannel.java" 826]
  [io.netty.util.concurrent.AbstractEventExecutor safeExecute "AbstractEventExecutor.java" 163]
  [io.netty.util.concurrent.SingleThreadEventExecutor runAllTasks "SingleThreadEventExecutor.java" 404]
  [io.netty.channel.nio.NioEventLoop run "NioEventLoop.java" 495]
  [io.netty.util.concurrent.SingleThreadEventExecutor$5 run "SingleThreadEventExecutor.java" 905]
  [java.lang.Thread run "Thread.java" 834]

@zoewangg
Copy link
Contributor

zoewangg commented Jul 15, 2019

@Eyal-Shalev Server failed to send complete response can happen when the service closes the connection mid request and this is valid per HTTP spec. Normally retry would work; is there any chance retry is disabled on ElasticLoadBalancingAsyncClient? If not, reducing connectionMaxIdleTime would help by closing idle connections more frequently. Are you still experiencing issue with s3 async client?

@zoewangg zoewangg added the closing-soon This issue will close in 4 days unless further comments are made. label Jul 23, 2019
@miere
Copy link

miere commented Aug 1, 2019

Guys, I'm still experiencing the same issue as the other devs. In my case it affects my Kinesis producer which is ingesting a couple of thousands of messages per second.

It worth mention that I'm using connectionMaxIdleTime of 5 seconds, as suggested earlier on this thread.

I'm using JDK 11 with the following libraries:

properties:
  # VERSIONS
  version.kotlin: 1.3.40
  version.jackson: 2.9.9
  version.aws: 2.7.15
  version.aws-kinesis: 2.2.1

Here is the stack trace.

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1076)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:827)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
... 54 common frames omitted

@ybhatnagar
Copy link

ybhatnagar commented Aug 5, 2019

We are also facing the same issue with the async client version 2.7.13. Our use case involves making a few calls to s3 to serve UI requests, and we wanted to use the async client. We are getting this error quite frequently, around the same rate as @dazito mentioned, and is becoming difficult for us to constantly monitor and restart the service. @zoewangg In case if there is not an immediate fix planned in an upcoming version, we might have to do try out s3 sync client, and in case it works, performance impact analysis. Do you have any suggestions on what should be the way forward?

@zoewangg
Copy link
Contributor

zoewangg commented Aug 5, 2019

@miere The suggested setting connectionMaxIdleTim = 5 seconds only applies to s3 client. This github issue tracks s3 specific issues, could you create a new issue for it (with repro case if possible)? We will look into it.

@ybhatnagar To be clear, are you having the issue with s3 sync client or async client? Have you tried to set connectionMaxIdleTime to 5 seconds? What APIs are you using? Is retry disabled? (retry is enabled by default).

@zoewangg zoewangg removed the closing-soon This issue will close in 4 days unless further comments are made. label Aug 5, 2019
@Eyal-Shalev
Copy link

Eyal-Shalev commented Aug 18, 2019

@zoewangg It seems like our issue with S3 connection was indeed resolved.
We are implementing retry ourselves (with exponential backoff), and to mitigate the issue with registerInstancesWithLoadBalancer we had to set up to 10 retries (5 wasn't enough).

@Eyal-Shalev
Copy link

@zoewangg We haven't disabled the retry, is it enabled by default?

@zoewangg
Copy link
Contributor

@Eyal-Shalev Glad to hear the issue was resolved for s3 client. Retry is enabled be default and the default max number of retries is 3.

@zoewangg
Copy link
Contributor

zoewangg commented Sep 4, 2019

Closing the issue due to no recent response. Feel free to open a new issue if you continue to see the error.

@seanzxx
Copy link

seanzxx commented Oct 3, 2019

I'm facing this issue with version 2.5.61 and I find this discussion.
I have one question, do I need to set connectionMaxIdleTime(Duration.ofSeconds(5)) after I update the dependency to version 2.9.11?

@zoewangg
Copy link
Contributor

zoewangg commented Oct 4, 2019

We just merged the change to updated the default connectionMaxIdleTime for s3 client via #1450 and it will be included in the next release.

Before that, you'd still need to set connectionMaxIdleTime(Duration.ofSeconds(5))

@jaylynstoesz
Copy link

jaylynstoesz commented Oct 9, 2019

I'm seeing this issue in the latest version of the SDK. I'm reading from SQS via Alpakka using the async client. I tried setting the connectionMaxIdleTime to 5 seconds and I'm still getting the same error.
Kotlin: 1.3.41
Netty & SQS Java SDK: 2.9.15

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
	at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)	
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1078)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:278)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:827)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
	... 52 more
Caused by: java.io.IOException: Server failed to send complete response
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
	... 43 more

@debora-ito
Copy link
Member

Hi @jaylynstoesz, I see you're using SQS so can you please open a new github issue, with a sample code we can use to repro? Setting connectionMaxIdleTime to 5 seconds is a suggestion tailored for S3 clients.

@alejandrod
Copy link

For SNS/SQS I have created this ticket
#1520
@jaylynstoesz

@pkgonan
Copy link

pkgonan commented Dec 12, 2021

@millems @debora-ito @zoewangg

Hi.
Reducing CONNECTION_MAX_IDLE_TIMEOUT to 5 seconds is not a workaround. I'm still having the following problem: Any solution?

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Channel was closed before it could be written to.

.put(SdkHttpConfigurationOption.CONNECTION_MAX_IDLE_TIMEOUT, Duration.ofSeconds(5))

[S3 Configuration]
Please let me know if there are any settings that need to be changed because they are too low or too high.

@Configuration
@EnableConfigurationProperties(S3ClientConfig.S3ClientConfigurationProperties::class)
open class S3ClientConfig(
    private val properties: S3ClientConfigurationProperties
) {

    companion object {
        private const val HTTP_CLIENT_MAX_CONCURRENCY = 1000
        private const val MAX_PENDING_CONNECTION_ACQUIRE = 10000

        private val LOG = LoggerFactory.getLogger(this::class.java)
        private val CONNECTION_TIMEOUT = Duration.ofSeconds(2)
        private val CONNECTION_ACQUISITION_TIMEOUT = Duration.ofSeconds(2)
        private val CONNECTION_MAX_IDLE_TIME = Duration.ofSeconds(5)
        private val WRITE_TIMEOUT = Duration.ofSeconds(30)
        private val READ_TIMEOUT = Duration.ofSeconds(30)
    }

    @Bean
    open fun s3AsyncClient(): S3AsyncClient {
        val clientBuilder = S3AsyncClient.builder()
            .httpClient(getHttpClient())
            .serviceConfiguration(getServiceConfiguration())
            .region(getS3Region())
            .dualstackEnabled(true)
            .credentialsProvider(getS3CredentialProvider())
            .overrideConfiguration { it.retryPolicy(RetryPolicy.none()) }
            .asyncConfiguration { it.advancedOption(SdkAdvancedAsyncClientOption.FUTURE_COMPLETION_EXECUTOR, Dispatchers.IO.asExecutor()) }

        if (properties.getEndpoint() != null) {
            clientBuilder.endpointOverride(properties.getEndpoint())
        }

        return clientBuilder.build()
    }

    private fun getHttpClient(): SdkAsyncHttpClient {
        return NettyNioAsyncHttpClient.builder()
            .protocol(Protocol.HTTP1_1)
            .tcpKeepAlive(true)
            .writeTimeout(WRITE_TIMEOUT)
            .readTimeout(READ_TIMEOUT)
            .connectionTimeout(CONNECTION_TIMEOUT)
            .connectionAcquisitionTimeout(CONNECTION_ACQUISITION_TIMEOUT)
            .connectionMaxIdleTime(CONNECTION_MAX_IDLE_TIME)
            .maxConcurrency(HTTP_CLIENT_MAX_CONCURRENCY)
            .maxPendingConnectionAcquires(MAX_PENDING_CONNECTION_ACQUIRE)
            .useIdleConnectionReaper(true)
            .build()
    }

    private fun getServiceConfiguration(): S3Configuration {
        return S3Configuration.builder()
            .checksumValidationEnabled(false)
            .chunkedEncodingEnabled(true)
            .accelerateModeEnabled(false)
            .build()
    }

    private fun getS3Region(): Region {
        return Region.of(properties.getRegion())
    }

    private fun getS3CredentialProvider(): AwsCredentialsProvider {
        try {
            return DefaultCredentialsProvider.create()
        } catch (ex: Exception) {
            LOG.error("Invalid S3 Credential Configuration Properties")
            throw InvalidAWSCredentialException("Invalid AWS Credential detected")
        }
    }

    @ConstructorBinding
    @ConfigurationProperties("content.storage.aws.s3")
    @Validated
    class S3ClientConfigurationProperties(
        @field:NotBlank private val region: String,
        private val endpoint: URI?
    ) {

        internal fun getRegion() = region
        internal fun getEndpoint() = endpoint

        override fun toString(): String {
            return "S3ClientConfigurationProperties(" +
                    "region=${getRegion()}, " +
                    "endpoint=${getEndpoint()}, " +
                    ")"
        }
    }
}

@Bennett-Lynch
Copy link
Contributor

Hi @pkgonan. For error messages that include Channel was closed before it could be written to specifically, it is likely related to a pending fix that is currently being worked on in #2883 (and I believe this is unrelated to the issue that this thread was originally created for). Please follow that PR and let us know if it resolves your issue once it is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.
Projects
None yet
Development

No branches or pull requests