CloudWatch putMetricData sometimes produce a java.io.IOException: Server failed to send complete response #1380

oripwk · 2019-08-11T14:22:31Z

This issue is exactly as #452, but instead of S3 it happens in CloudWatch.

Expected Behavior

putMetricData(…) should not throw an exception.

Current Behavior

java.io.IOException: Server failed to send complete response is thrown several times a minute. Most of the requests seem to succeed.

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
	at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
	at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
	at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
	... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
	... 54 common frames omitted

Possible Solution

I have no idea.

Steps to Reproduce (for bugs)

Create a CloudWatch client that sends putMetricData requests about 100 times a minute, where each request has 20 metrics, and each metric contains 150 values.

Context

Monitoring real-time streaming application.

Your Environment

AWS Java SDK version used: 2.7.15
JDK version used: 1.8
Operating System and version: Linux 4.14.62-65.117.amzn1.x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

sschepens · 2019-08-27T20:05:58Z

observed the same using getMetricData

varunnvs92 · 2019-08-29T19:12:44Z

How often are you seeing this issue?
Do you have test code to consistently repo the issue?

As discussed in #452, we made some fixes which reduced this error occurrence a lot. But there is still small chance of seeing this error. Code to repo will help us debug the issue further. I would also suggest you to try with the latest version and see if there is any improvement.

slingala-scout24 · 2019-08-30T08:53:17Z

We have the same issue as well. We upgraded the aws sdk to the latest version 2.7.32 in the hope that the issue will be fixed, but unfortunately it didn't help. This issue happens when the application runs for a long time without a restart. But as soon as there is an application restart, we don't see this issue anymore until the next prolonged run. It usually happens after about 24 hours of the application start. Unfortunately I do not have a reproducible code, but we just use the default metrics publishing configuration.

oripwk · 2019-09-03T07:25:29Z

@varunnvs92

As I said, this issue happens regularly, almost once a second. It never stops throwing that exception.
The issue happens during production workload. In order to write code that reproduces this, I'll have to artificially create the same conditions. It might take a while. I'll keep you posted.

I think that because it's reproducing for other people and it's very similar to #452, it should deserve more attention.

trautonen · 2019-09-17T07:20:03Z

We've found that only solution for now to this issue and #452 is to use the sync clients which use apache http client instead of netty.

zoewangg · 2019-09-24T21:46:40Z

Apologies for the delayed response. Unfortunately, I'm not able to reproduce the issue. Can you provide more details of how to reproduce the issue? For example, the apis you are using, and the number of requests per second

I can see Server failed to send complete response errors with retry disabled because it's possible that the connection gets closed remotely mid request, and normally the next retry should pick up a new connection and succeeds. Is there any chance you disabled retry on the SDK client? Can you provide code snippet of how you create the client?

trautonen · 2019-09-25T08:02:19Z

We are using with default config, default connection parameters.

CloudWatchAsyncClient.builder().build()

And it's not single or random requests that fail. Like @oripwk said, it's constant failure, all requests fail. We are not triggering it very heavily either. Burst of like 5 reqs in sec once in a minute. And it's surely related to the HTTP client as there are zero problems with Apache HTTP client.

zoewangg · 2019-09-25T22:32:14Z

@trautonen Thank you for sharing the info. Unfortunately, still no luck reproducing the issue.

I tried sending 5 concurrent requests every minute for a total of 20 minutes and all requests succeeded; I did notice that some requests failed at the first attempt but succeeded in the next retry.

Is there anyway you can provide debug logs? Just wanted to check if all retry attempts failed of the same reason in your use case.

        StabilityTestRunner.newRunner()
                           .testName("CloudWatchAsyncStabilityTest." + testName)
                           .futureFactory(cloudWatchAsyncClient.putMetricData(b -> b.namespace(namespace)
                                                                                    .metricData(metrics)))
                           .totalRuns(20)
                           .requestCountPerRun(5)
                           .delaysBetweenEachRun(Duration.ofMinutes(1))
                           .run();

Could you try setting connectionMaxIdleTimeout to 5 seconds? In my tests, reducing the max idle timeout largely reduced the chances of the idle connection getting closed mid request on the service side.

        CloudWatchAsyncClient.builder()
                             .httpClientBuilder(NettyNioAsyncHttpClient.builder()
                                                                       .connectionMaxIdleTime(Duration.ofSeconds(5)))
                            .build();

zoewangg · 2019-10-04T20:09:57Z

We have pushed out the change to update the default connectionMaxIdleTime for cloudwatch client via #1450. The change has been released as part of 2.9.14.

sycyhy · 2019-10-07T08:05:07Z

I got similar issue (exactly the same exception) and I wonder if it's also gonna be fixed with this.

I create a Cloudwatch client like this

    private static final AwsCredentialsProvider CREDENTIALS_PROVIDER = AwsCredentialsProviderChain.of(
            ProfileCredentialsProvider.create(),
            EnvironmentVariableCredentialsProvider.create(),
            ContainerCredentialsProvider.builder().asyncCredentialUpdateEnabled(true).build(),
            InstanceProfileCredentialsProvider.create()
    );

    private static CloudWatchAsyncClient initializeCloudWatchClient() {
        return CloudWatchAsyncClient.builder()
                .credentialsProvider(CREDENTIALS_PROVIDER)
                .build();
    }

My metrics are send properly for a few days and then suddenly in the middle of the night most of the requests will start failing with the same exception as the author mentioned. The traffic levels are actually lower during the night so it seems to be a bit different scenario.

I will try the new version either way, but if you see anything obvious that I'm missing let me know.

zoewangg · 2019-10-09T22:55:04Z

Hi @sycyhy, have you tried the new version? Are you still experiencing the issue?

sycyhy · 2019-10-09T23:16:23Z

Hi @zoewangg - yep it's on Production right now, but I will be able to confirm after a weekend because during the week there is number of deployments and the issue pops up when you leave the app without any restarts for a few days :)

kaffepanna · 2019-10-15T05:51:19Z

Seeing the same issue when using with KCL under no particular load

zoewangg · 2019-10-15T20:57:26Z

@kaffepanna Are you still experiencing the issue with the latest version?

kaffepanna · 2019-10-16T05:30:12Z

@zoewangg using version 2.9.20

sycyhy · 2019-10-16T17:01:43Z

Yep, my issue is still present - updating the library to 2.9.14 sadly did not help

zoewangg · 2019-10-16T17:08:17Z

@sycyhy Thank you for reporting back. Looks like the issue seems to occur after letting the app running for a while. Did it ever recover after the errors or all subsequent requests failed?

I'll continue investigating.

sycyhy · 2019-10-16T17:11:11Z

@zoewangg nope it does not recover, the app needs to be restarted. I also just noticed that the IOException slightly changed with the new version ->

Caused by: s.a.a.c.e.SdkClientException: null
at s.a.a.c.e.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at s.a.a.c.i.u.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at s.a.a.c.i.h.p.s.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
... 35 common frames omitted
Caused by: java.io.IOException: Channel was closed before it could be written to.
... 28 common frames omitted"

zoewangg · 2019-10-16T17:19:09Z

Were all errors "IOException: Channel was closed before it could be written to" or just some of them?

They were caused by the same reason - the server has closed the connection remotely in the middle of the request, but for this error, it seems the connection has been closed before the SDK was able to write the request.

sycyhy · 2019-10-16T17:42:23Z

yep you are right, only some of them (actually only 2 of those for hundreds of the Server failed to send complete response)

zoewangg · 2019-10-17T01:10:48Z

Still not able to reproduce the exact all requests failure issue, but found out one issue in the SDK. There are certain cases when conneciton=close is present in the response header, the SDK is not closing the connection as expected and will try to reuse that connection for other requests. Thus if the next request picks up that bad connection right before service closes the connection, IOException would be thrown once the connection is closed remotely.

…f a channel before picking it up in the pool. See #1380

zoewangg · 2019-10-18T01:18:18Z

#1476 has been merged to address the issue with unresusable connections getting reused and the change will be included in the next release.

robinspollak · 2019-10-18T08:37:28Z

@sycyhy, thanks for raising this we're seeing a very similar issue and thanks to all who are working to resolve it. Lukasz, I'm wondering whether you are seeing any associated CPU spikes when this problem arises?

zoewangg · 2019-10-18T20:46:26Z

Hi @sycyhy @robinspollak @kaffepanna,

The fix has been released as part of 2.9.24. Could you try with the latest version and see if the issue is resolved for your use case? I was not able to reproduce the exact issues you were seeing, so it would be great if you could help verify the fix.

robinspollak · 2019-10-19T15:51:59Z

@zoewangg, 2.9.24 is in production now and I'll report back! Thanks very much

robinspollak · 2019-10-19T18:31:02Z

@zoewangg, unfortunately we're still seeing the following error:

java.lang.RuntimeException: software.amazon.awssdk.core.exception.SdkClientException
 	at
software.amazon.kinesis.retrieval.AWSExceptionManager.apply(AWSExceptionManager.java:65) 	
at software.amazon.kinesis.metrics.CloudWatchMetricsPublisher.blockingExecute(CloudWatchMetricsPublisher.java:89) 	
at software.amazon.kinesis.metrics.CloudWatchMetricsPublisher.publishMetrics(CloudWatchMetricsPublisher.java:74) 	
at software.amazon.kinesis.metrics.CloudWatchPublisherRunnable.runOnce(CloudWatchPublisherRunnable.java:138) 	
at software.amazon.kinesis.metrics.CloudWatchPublisherRunnable.run(CloudWatchPublisherRunnable.java:84) 	
at java.lang.Thread.run(Thread.java:748) Caused by: software.amazon.awssdk.core.exception.SdkClientException: null 	
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97) 	
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98) 	
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125) 	
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107) 	
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) 	
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 	
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) 	
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249) 	
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:391) 	
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:174) 	
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:391) 	
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:142) 	
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257) 	
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243) 	
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
--
.
.
.

It is the same as the one we were seeing before the new release, let me know if a full stack trace or a sample of how we are setting up our kinesis client would help.

zoewangg · 2019-10-19T19:02:44Z

@robinspollak Thanks for reporting back. Yes, it'd be helpful if you could share the sample code of how you set up the cloudwatch client. Did all requests fail or just some requests failed? Are you running it on an EC2 instance? If so, could you share what type of that instance is?

robinspollak · 2019-10-21T11:03:55Z

@zoewangg We set up our (scala) KCL integration as follows:

class KinesisConfigurationProvider @Inject()(
    configuration: Configuration,
    recordProcessorFactory: ShardRecordProcessorFactory
) {

  def apply(streamName: String): ConfigsBuilder = {
    val workerId = InetAddress.getLocalHost.getCanonicalHostName + ":" + UUID
      .randomUUID()
    val kinesisAsyncClient = KinesisAsyncClient.builder().region(Region.EU_WEST_1).build()
    val dynamoDBClient = DynamoDbAsyncClient.builder().region(Region.EU_WEST_1).build()
    val cloudWatchClient = CloudWatchAsyncClient.builder().region(Region.EU_WEST_1).build()
    new ConfigsBuilder(
      streamName,
      s"app-$streamName-processor",
      kinesisAsyncClient,
      dynamoDBClient,
      cloudWatchClient,
      workerId,
      recordProcessorFactory
    )
  }
}

It seems that only some requests fail. Upon deploy or reboot we won't see the log lines for about 3 hours. They then escalate in frequency for around 24 hours and, eventually, the server stops responding and needs to be rebooted. Our servers run on an m4.xlarge EC2 instance. Hope this helps and I'm happy to provide more information

zoewangg · 2019-10-21T17:45:49Z

@robinspollak Thanks for the info. A few follow up questions: did you see any other non cloudwatch errors before the server stopped responding? When the server stopped responding, did all request fail of "Server failed to send complete response " or it just got stuck somehow? What KCL version are you using?

@sycyhy are you also using KCL or just using cloudwatch async client?

robinspollak · 2019-10-24T10:47:25Z

Hi @zoewangg, sorry for the delayed response. We don't really see any other non-cloudwatch errors but we're also trying our best to determine with certainty that cloudwatch is causing the problems. Thus far, we haven't been able to rule out another source of the problem.

In terms of the server behavior when it stops responding... the entire server just ceases to respond to any requests and there are no responses to any requests and thus no errors in response to requests. We're using KCL 2.2.4.

Thanks again for all your help.

sycyhy · 2019-10-24T12:26:54Z

@zoewangg I use it via micrometer and not KCL which is a layer of abstraction of the async client -> which I passed manually when I configure the reporter.

btw -> I needed to revert to the legacy client due this (I could not wake up more people during the night sadly haha).

zoewangg · 2019-10-26T00:09:32Z

Hi @robinspollak, for the server not responding issue, my guess is that it might not be related to the cloudwatch errors. I'd recommend taking a thread dump or heap dump to see if there's a blocking thread or increased resource usage causing the server to stop responding.

We have pushed out a couple of fixes #1481 and #1479 (available in 2.10.0) that could cause the server failed to complete the response errors. KCL has also released a new version 2.2.5. Could you try the new versions?

@sycyhy Sorry to hear that you had to revert the client code 😞 We are currently working on setting up more tests scenarios that cover more real-world use cases to catch the issues.

zoewangg · 2019-12-10T00:57:50Z

Closing the issue due to lack of activity. Please create a new issue if you continue to see this error with the latest SDK.

cgallom · 2024-05-02T19:13:52Z

@oripwk Hi, I have read this post and others like it and I notice that you never could check if the updates on aws-sdk-java-v2 worked to fix your bug. I am currently using version 2.10.41 and the error persists in this version. Did you find out how to fix this bug ?

varunnvs92 added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. needs-response and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. labels Aug 29, 2019

zoewangg mentioned this issue Oct 2, 2019

Add specific http configurations for cloudwatch and s3 #1450

Merged

13 tasks

zoewangg added the closing-soon This issue will close in 4 days unless further comments are made. label Oct 4, 2019

danotorrey mentioned this issue Oct 10, 2019

Kinesis error: Unknown exception while publishing 18 datums to CloudWatch Graylog2/graylog-plugin-integrations#293

Closed

zoewangg added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed closing-soon This issue will close in 4 days unless further comments are made. labels Oct 16, 2019

zoewangg added a commit that referenced this issue Oct 17, 2019

Update Netty HealthCheckedChannelPool to check KEEP_ALIVE attribute o…

494f26d

…f a channel before picking it up in the pool. See #1380

zoewangg mentioned this issue Oct 17, 2019

Update HealthCheckedChannelPool to check KEEP_ALIVE attribute #1476

Merged

13 tasks

zoewangg added a commit that referenced this issue Oct 18, 2019

Update Netty HealthCheckedChannelPool to check KEEP_ALIVE attribute o…

2553931

…f a channel before picking it up in the pool. See #1380

anilkumarmyla mentioned this issue Nov 14, 2019

Discontinuous graphs for counters after switching to aws sdkv2 azagniotov/codahale-aggregated-metrics-cloudwatch-reporter#29

Closed

zoewangg closed this as completed Dec 10, 2019

debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-response labels Feb 24, 2020

ankurgupta3-globallogic mentioned this issue Aug 19, 2021

Getting CloudWatch java.io.IOException: Server failed to send complete response while using cloudwatch putMetricData API #2668

Closed

Bennett-Lynch mentioned this issue Dec 15, 2021

[netty-nio-client] Ensure in-use channels are not incorrectly closed #2883

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudWatch putMetricData sometimes produce a java.io.IOException: Server failed to send complete response #1380

CloudWatch putMetricData sometimes produce a java.io.IOException: Server failed to send complete response #1380

oripwk commented Aug 11, 2019 •

edited

Loading

sschepens commented Aug 27, 2019

varunnvs92 commented Aug 29, 2019

slingala-scout24 commented Aug 30, 2019

oripwk commented Sep 3, 2019

trautonen commented Sep 17, 2019

zoewangg commented Sep 24, 2019

trautonen commented Sep 25, 2019

zoewangg commented Sep 25, 2019

zoewangg commented Oct 4, 2019 •

edited

Loading

sycyhy commented Oct 7, 2019

zoewangg commented Oct 9, 2019

sycyhy commented Oct 9, 2019

kaffepanna commented Oct 15, 2019

zoewangg commented Oct 15, 2019

kaffepanna commented Oct 16, 2019

sycyhy commented Oct 16, 2019

zoewangg commented Oct 16, 2019

sycyhy commented Oct 16, 2019

zoewangg commented Oct 16, 2019

sycyhy commented Oct 16, 2019

zoewangg commented Oct 17, 2019

zoewangg commented Oct 18, 2019

robinspollak commented Oct 18, 2019

zoewangg commented Oct 18, 2019

robinspollak commented Oct 19, 2019 •

edited

Loading

robinspollak commented Oct 19, 2019 •

edited

Loading

zoewangg commented Oct 19, 2019

robinspollak commented Oct 21, 2019 •

edited

Loading

zoewangg commented Oct 21, 2019

robinspollak commented Oct 24, 2019

sycyhy commented Oct 24, 2019 •

edited

Loading

zoewangg commented Oct 26, 2019

zoewangg commented Dec 10, 2019

cgallom commented May 2, 2024

CloudWatch putMetricData sometimes produce a java.io.IOException: Server failed to send complete response #1380

CloudWatch putMetricData sometimes produce a java.io.IOException: Server failed to send complete response #1380

Comments

oripwk commented Aug 11, 2019 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

sschepens commented Aug 27, 2019

varunnvs92 commented Aug 29, 2019

slingala-scout24 commented Aug 30, 2019

oripwk commented Sep 3, 2019

trautonen commented Sep 17, 2019

zoewangg commented Sep 24, 2019

trautonen commented Sep 25, 2019

zoewangg commented Sep 25, 2019

zoewangg commented Oct 4, 2019 • edited Loading

sycyhy commented Oct 7, 2019

zoewangg commented Oct 9, 2019

sycyhy commented Oct 9, 2019

kaffepanna commented Oct 15, 2019

zoewangg commented Oct 15, 2019

kaffepanna commented Oct 16, 2019

sycyhy commented Oct 16, 2019

zoewangg commented Oct 16, 2019

sycyhy commented Oct 16, 2019

zoewangg commented Oct 16, 2019

sycyhy commented Oct 16, 2019

zoewangg commented Oct 17, 2019

zoewangg commented Oct 18, 2019

robinspollak commented Oct 18, 2019

zoewangg commented Oct 18, 2019

robinspollak commented Oct 19, 2019 • edited Loading

robinspollak commented Oct 19, 2019 • edited Loading

zoewangg commented Oct 19, 2019

robinspollak commented Oct 21, 2019 • edited Loading

zoewangg commented Oct 21, 2019

robinspollak commented Oct 24, 2019

sycyhy commented Oct 24, 2019 • edited Loading

zoewangg commented Oct 26, 2019

zoewangg commented Dec 10, 2019

cgallom commented May 2, 2024

oripwk commented Aug 11, 2019 •

edited

Loading

zoewangg commented Oct 4, 2019 •

edited

Loading

robinspollak commented Oct 19, 2019 •

edited

Loading

robinspollak commented Oct 19, 2019 •

edited

Loading

robinspollak commented Oct 21, 2019 •

edited

Loading

sycyhy commented Oct 24, 2019 •

edited

Loading