Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"IOException: Connection was closed" when retrying non-WebApplicationExceptions #37323

Closed
gefloh opened this issue Nov 26, 2023 · 19 comments · Fixed by #40385
Closed

"IOException: Connection was closed" when retrying non-WebApplicationExceptions #37323

gefloh opened this issue Nov 26, 2023 · 19 comments · Fixed by #40385
Labels
area/rest-client kind/bug Something isn't working
Milestone

Comments

@gefloh
Copy link
Contributor

gefloh commented Nov 26, 2023

Describe the bug

Disclaimer: I asked it here already, because I am not sure if it is a bug. At least the behavior seems strange and not logical to me.

I have two local Quarkus applications, one - I will call it server - offers a simple REST endpoint, the other uses a RESTEasy Reactive RestClient to access this endpoint and has a retry configured in case of errors.

When I throw a WebApplicationException or a sub-type of it in the server endpoint, the client does a retry as expected.
When I throw any other Exception like e.g. a simple RuntimeException, the client receives a bunch of

2023-11-26 23:46:17,252 DEBUG [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (vert.x-eventloop-thread-0)
Restarting handler chain for exception exception: jakarta.ws.rs.ProcessingException:
java.io.IOException: Connection was closed
	at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:366)
	at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:357)
(...)
2023-11-26 23:46:17,256 DEBUG [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (vert.x-eventloop-thread-0) 
Attempting to handle unrecoverable exception: jakarta.ws.rs.ProcessingException:
java.io.IOException: Connection was closed
	at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:366)
	at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:357)

on the first retry attempt. No further retries are carried out then, it simply stops.

Expected behavior

I would expect that the client does a retry in all HTTP 500 error cases because I cannot guarantee that only WebApplicationExceptions are thrown on the server side.

Actual behavior

The client receives a HTTP 500 in both cases (the body differs though) but it is not able to retry on Exceptions which are not WebApplicationExceptions.

How to Reproduce?

I created a reproducer here. Follow the README.md to start the two applications.

Output of uname -a or ver

Linux xxx 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

Output of java -version

openjdk version "17.0.7" 2023-04-18 OpenJDK Runtime Environment Temurin-17.0.7+7 (build 17.0.7+7) OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing)

Quarkus version or git rev

3.5.3

Build tool (ie. output of mvnw --version or gradlew --version)

mvnw 3.8.8

Additional information

No response

@gefloh gefloh added the kind/bug Something isn't working label Nov 26, 2023
Copy link

quarkus-bot bot commented Nov 29, 2023

/cc @FroMage (resteasy-reactive), @stuartwdouglas (resteasy-reactive)

@geoand
Copy link
Contributor

geoand commented Apr 16, 2024

Is this still an issue with the latest Quarkus versions?

@geoand geoand added the triage/needs-feedback We are waiting for feedback. label Apr 16, 2024
@gefloh
Copy link
Contributor Author

gefloh commented Apr 16, 2024

@geoand I just tried it with Quarkus 3.9.3 and the behavior doesn't seem to have changed.

@geoand
Copy link
Contributor

geoand commented Apr 16, 2024

Thanks for checking

@geoand geoand removed the triage/needs-feedback We are waiting for feedback. label Apr 16, 2024
@madocx
Copy link

madocx commented Apr 18, 2024

Just want to add to this, I'm having similar issues with a container health check implementation utilizing a number of resteasy reactive rest clients. Despite completely avoiding any kind of retry logic, after a number of requests the health check fails and all subsequent rest client requests start failing with the same IOException: Connection was closed error.

: java.util.concurrent.CompletionException: java.io.IOException: Connection was closed at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:79) at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65) at io.smallrye.mutiny.groups.UniAwait.indefinitely(UniAwait.java:46) at org.company.common.rest.client.AbstractRestClientHealthCheck.healthCheck(AbstractRestClientHealthCheck.java:18) at org.company.common.health.AbstractHealthCheck.call(AbstractHealthCheck.java:26) at org.company.consumer.auth.health.DemographicsApiHealthCheck_ClientProxy.call(Unknown Source) at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21) at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28) at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36) at io.smallrye.mutiny.operators.uni.UniOnFailureFlatMap.subscribe(UniOnFailureFlatMap.java:31) at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36) at io.smallrye.mutiny.operators.uni.UniOnItemTransform.subscribe(UniOnItemTransform.java:22) at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36) at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27) at io.smallrye.mutiny.vertx.MutinyHelper.lambda$blockingExecutor$6(MutinyHelper.java:62) at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:190) at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:276) at io.vertx.core.impl.ContextImpl.lambda$internalExecuteBlocking$2(ContextImpl.java:209) at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:582) at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1512) at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29) at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at [email protected]/java.lang.Thread.runWith(Thread.java:1596) at [email protected]/java.lang.Thread.run(Thread.java:1583) at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:833) at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:211) Caused by: java.io.IOException: Connection was closed at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:380) at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:371) at io.vertx.core.impl.future.FutureImpl$2.onFailure(FutureImpl.java:117) at io.vertx.core.impl.future.FutureImpl$ListenerArray.onFailure(FutureImpl.java:303) at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:75) at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:305) at io.vertx.core.impl.DuplicatedContext.execute(DuplicatedContext.java:181) at io.vertx.core.impl.future.FutureBase.emitFailure(FutureBase.java:72) at io.vertx.core.impl.future.FutureImpl.tryFail(FutureImpl.java:265) at io.vertx.core.http.impl.HttpClientRequestBase.fail(HttpClientRequestBase.java:182) at io.vertx.core.http.impl.HttpClientRequestBase.handleException(HttpClientRequestBase.java:177) at io.vertx.core.http.impl.HttpClientRequestImpl.handleException(HttpClientRequestImpl.java:90) at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleException(Http1xClientConnection.java:711) at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleClosed(Http1xClientConnection.java:717) at io.vertx.core.http.impl.Http1xClientConnection.lambda$handleClosed$12(Http1xClientConnection.java:1219) at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:320) at io.vertx.core.impl.DuplicatedContext.execute(DuplicatedContext.java:171) at io.vertx.core.http.impl.Http1xClientConnection.handleClosed(Http1xClientConnection.java:1219) at io.vertx.core.net.impl.VertxHandler.channelInactive(VertxHandler.java:143) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:418) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:410) at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:221) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1174) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ... 5 more

@madocx
Copy link

madocx commented Apr 18, 2024

Setting quarkus.rest-client.keep-alive-enabled=false seems to resolve the issue, but obviously has significant performance impacts and isn't really production viable.

@geoand
Copy link
Contributor

geoand commented Apr 22, 2024

I'm having similar issues with a container health check implementation utilizing a number of resteasy reactive rest clients.

Just to make sure I understand, you are using the REST Client in order to call the health check of other services in the cluster?

@madocx
Copy link

madocx commented Apr 22, 2024

I'm having similar issues with a container health check implementation utilizing a number of resteasy reactive rest clients.

Just to make sure I understand, you are using the REST Client in order to call the health check of other services in the cluster?

Yup, exactly. Not sure if its a factor, but the downstream APIs are using the experimental "wellness check" of the quarkus smallrye health extension. Will be doing some testing today exclusively with microprofile health endpoints to see if it makes any difference.

@madocx
Copy link

madocx commented Apr 23, 2024

I'm having similar issues with a container health check implementation utilizing a number of resteasy reactive rest clients.

Just to make sure I understand, you are using the REST Client in order to call the health check of other services in the cluster?

Yup, exactly. Not sure if its a factor, but the downstream APIs are using the experimental "wellness check" of the quarkus smallrye health extension. Will be doing some testing today exclusively with microprofile health endpoints to see if it makes any difference.

Made no difference.

@geoand
Copy link
Contributor

geoand commented Apr 24, 2024

@vietj @cescoffier I assume the Vert.x HTTP client should remove closed connections from the pool, right?

@cescoffier
Copy link
Member

It should, but let's wait for @vietj response.

@vietj
Copy link

vietj commented Apr 30, 2024

@geoand I confirm that closed client connections are evicted from the pool

@geoand
Copy link
Contributor

geoand commented Apr 30, 2024

Thanks @vietj

@madocx
Copy link

madocx commented Apr 30, 2024

So what does this mean for this bug? Likely related to resteasy reactive rest client extension if underlying vertx http is not root cause?

@geoand
Copy link
Contributor

geoand commented Apr 30, 2024

It's unclear and needs further investigation

stuartwdouglas added a commit to stuartwdouglas/quarkus that referenced this issue May 1, 2024
stuartwdouglas added a commit to stuartwdouglas/quarkus that referenced this issue May 1, 2024
@stuartwdouglas
Copy link
Member

So there is an issue where the server was closing the connection when it shouldn't have, resulting in the exception.

The client should be able to handle a connection close though, but its just that in this case because the close was sent after the response (instead of having FIN on the last packet of the response) there is basically a race condition where the client does not know the connection is closed yet, and the retry will use the same connection that is about to close.

There is not really anything that can be done about this on the client side, pooled HTTP connections are inherently racy if the server decides to close them in between connections. In this case adding a small back off delay to the retry would have fixed it from the client side.

@quarkus-bot quarkus-bot bot added this to the 3.11 - main milestone May 1, 2024
@madocx
Copy link

madocx commented May 1, 2024

Thanks all. Will this be added to LTS since its a bugfix?

@geoand
Copy link
Contributor

geoand commented May 1, 2024

That's the plan

@gefloh
Copy link
Contributor Author

gefloh commented May 1, 2024

Thanks guys, highly appreciated!

@gsmet gsmet modified the milestones: 3.11 - main, 3.10.1 May 10, 2024
gsmet pushed a commit to gsmet/quarkus that referenced this issue May 10, 2024
@gsmet gsmet modified the milestones: 3.10.1, 3.8.5 May 22, 2024
gsmet pushed a commit to gsmet/quarkus that referenced this issue May 22, 2024
gsmet pushed a commit to gsmet/quarkus that referenced this issue May 22, 2024
gsmet pushed a commit to gsmet/quarkus that referenced this issue May 23, 2024
gsmet pushed a commit to gsmet/quarkus that referenced this issue Jun 4, 2024
michalvavrik pushed a commit to quarkus-qe/quarkus-test-suite that referenced this issue Jun 7, 2024
* Coverage for JSON tokens

https://issues.redhat.com/browse/QUARKUS-4179

* Cover client handling server failures

https://issues.redhat.com/browse/QQE-692
quarkusio/quarkus#37323

* Cover case of failing security tests

They were failing in a presence of a router filter and resteasy-reactive
dependency
quarkusio/quarkus#40307
https://issues.redhat.com/browse/QQE-692

* Add coverage for max parameters of multipart

tweak mulripart tests

fix for loop initial from 0

* Update to 3.8.5

---------

Co-authored-by: jcarranzan <[email protected]>
holly-cummins pushed a commit to holly-cummins/quarkus that referenced this issue Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rest-client kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants