Response writing fails to complete with WebFlux on Tomcat #26434

danielra · 2021-01-23T03:05:18Z

Affects: \v5.3.3

The symptom of this issue is similar to previous issue #23096 (response handling never completes) though the underlying cause is different.

I've reproduced this issue with the small demo spring-boot project which uses WebFlux on Tomcat, seen here: https://github.com/danielra/webflux_race_investigation
The project can be built with ./gradlew clean build and then run via ./start.sh.
To reproduce the problem I've been successful using wrk (https://github.com/wg/wrk) to throw some load at the service, stop and then throw some more etc.. For example:

for i in {1..100}; do   ./wrk -t 2 -c 1000 -d 10s --latency --timeout 9s --script ./myPost.lua http://localhost:8080/post; done

Where ./myPost.lua looks like:

wrk.method = "POST"
wrk.body   = "[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]"
wrk.headers["Content-Type"] = "application/json"

(I don't think this being a POST request really ended up being relevant, but I started here because I was trying to start the more isolated repro somewhere somewhat similar to the real system that was experiencing the problem.)

This will result in a lot of trace level logging under a logs directory in the demo project. This can be searched through to find any occurrences like:

find . -name \*.gz -print0 | xargs -0 zgrep "Response timeout"

And then another search for the logPrefix found in resulting log entries can be performed to gain more context on the processing for the relevant request.

This is an example set of log entries for a repro case (with irrelevant line prefixes ommitted):

12:31:47,007 INFO c.e.d.f.DemoWebFilter [http-nio-8080-exec-368] [33d6780] Incoming request observed.
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] UNSUBSCRIBED -> SUBSCRIBING
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] 9223372036854775807 requested
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] SUBSCRIBING -> DEMAND
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] onDataAvailable
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] DEMAND -> READING
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Read 8092 bytes
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Publishing data read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Read 8192 bytes
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Publishing data read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Read 4067 bytes
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] Publishing data read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] No more data to read
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] READING -> DEMAND
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] onAllDataRead
12:31:47,007 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-368] [33d6780] DEMAND -> COMPLETED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] UNSUBSCRIBED -> REQUESTED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] Received onNext publisher
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] REQUESTED -> RECEIVED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] UNSUBSCRIBED -> REQUESTED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] Item to write
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] REQUESTED -> RECEIVED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] onWritePossible
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] RECEIVED -> WRITING
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] Wrote 12931 of 12931 bytes
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] WRITING -> REQUESTED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] No more items to write
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] REQUESTED -> RECEIVED
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [33d6780] isWritePossible: false
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] UNSUBSCRIBED subscribe: org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$WriteResultSubscriber@7a996bbf
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] SUBSCRIBING request: 9223372036854775807
12:31:47,780 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [33d6780] Received onComplete
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] UNSUBSCRIBED subscribe: reactor.core.publisher.MonoNext$NextSubscriber@53a4d00d
12:31:47,780 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [33d6780] SUBSCRIBING request: 9223372036854775807
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-348] [33d6780] onWritePossible
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED -> COMPLETED
12:31:47,781 TRACE _.s.h.s.r.WriteResultPublisher [http-nio-8080-exec-348] [33d6780] SUBSCRIBED publishComplete
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED writeComplete
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] Flush attempt
12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED flushingFailed
12:31:59,588 TRACE _.s.h.s.r.AbstractListenerReadPublisher [parallel-4] [33d6780] Cancellation
12:31:59,588 TRACE _.s.h.s.r.WriteResultPublisher [parallel-4] [33d6780] SUBSCRIBED cancel
12:31:59,588 ERROR c.e.d.f.DemoWebFilter [parallel-4] [33d6780] Response timeout after 12581 milliseconds for null request with uri 'http://localhost:8080/post'. Response status code was already committed: '200 OK'.                                                                                                                                                                                                               12:31:59,588 TRACE o.s.h.s.r.ServletHttpHandlerAdapter [parallel-4] [33d6780] Handling completed

Here the line 12:31:47,781 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-348] [33d6780] RECEIVED flushingFailed comes from a modification I made locally to the spring-framework to add an additional trace log line here: https://github.com/spring-projects/spring-framework/blob/v5.3.3/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java#L194-L195
like so:

if (rsWriteFlushLogger.isTraceEnabled()) {
            rsWriteFlushLogger.trace(this.logPrefix + this.state + " flushingFailed", t);
}

For reference, the relevant Throwable logged at these lines in these reproductions with this demo project looks like:

org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
        at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:272) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:118) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.springframework.http.server.reactive.ServletServerHttpResponse.flush(ServletServerHttpResponse.java:198) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.ServletServerHttpResponse.access$500(ServletServerHttpResponse.java:50) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseBodyFlushProcessor.flush(ServletServerHttpResponse.java:316) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$3.writeComplete(AbstractListenerWriteFlushProcessor.java:291) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$WriteResultSubscriber.onComplete(AbstractListenerWriteFlushProcessor.java:437) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.WriteResultPublisher$State.publishComplete(WriteResultPublisher.java:256) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.WriteResultPublisher.publishComplete(WriteResultPublisher.java:84) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.AbstractListenerWriteProcessor.changeStateToComplete(AbstractListenerWriteProcessor.java:280) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.AbstractListenerWriteProcessor.access$300(AbstractListenerWriteProcessor.java:46) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.AbstractListenerWriteProcessor$State$3.onWritePossible(AbstractListenerWriteProcessor.java:368) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.AbstractListenerWriteProcessor.onWritePossible(AbstractListenerWriteProcessor.java:153) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseBodyWriteListener.onWritePossible(ServletServerHttpResponse.java:270) ~[spring-web-5.3.3.jar!/:5.3.3]
        at org.apache.coyote.Response.onWritePossible(Response.java:762) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:188) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:241) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:888) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_282]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_282]
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:1.8.0_282]
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:1.8.0_282]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:1.8.0_282]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:1.8.0_282]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) ~[?:1.8.0_282]
        at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:138) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite(NioEndpoint.java:1269) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.tomcat.util.net.SocketWrapperBase.doWrite(SocketWrapperBase.java:764) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.tomcat.util.net.SocketWrapperBase.flushNonBlocking(SocketWrapperBase.java:735) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.tomcat.util.net.SocketWrapperBase.flush(SocketWrapperBase.java:709) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.flush(Http11OutputBuffer.java:572) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.http11.filters.IdentityOutputFilter.flush(IdentityOutputFilter.java:117) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.http11.Http11OutputBuffer.flush(Http11OutputBuffer.java:220) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.http11.Http11Processor.flush(Http11Processor.java:1195) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:402) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.coyote.Response.action(Response.java:209) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:305) ~[tomcat-embed-core-9.0.41.jar!/:9.0.41]
        ... 25 more

And appears to me to be a result of the client hanging up the connection before response handling is complete on the server.

After this flushingFailed log line, we can see a relatively large (~12 second) gap in time before the next log line which was triggered by the response timeout being triggered which is setup in the DemoWebFilter in the demo project here: https://github.com/danielra/webflux_race_investigation/blob/main/src/main/java/com/example/demo/filter/DemoWebFilter.java#L38 And only after this timeout does ServletHttpHandlerAdapter log a line indicating that handling is complete.

I note that the flushingFailed method where I added the extra log line above has a comment which reads:

Invoked when an error happens while flushing. Sub-classes may choose
to ignore this if they know the underlying API will provide an error
notification in a container thread.
Defaults to no-op.

Based on this description and the observed behavior, it seems to me that perhaps a notification of the error was expected elsewhere, but it was never actually received so handling of the response is never completed (without a timeout present). I don't know enough yet to say whether that was due to a bug in the code that should have emitted the error notification or on the listening side - or if there should actually be a real non-no-op operation implemented of flushingFailed in the relevant subclass in the Tomcat case. However, my expectation is that regardless of when a client hangs up the connection the server should proceed with completing response handling.

Please let me know if any additional information would be helpful. Thank you for your time!

The text was updated successfully, but these errors were encountered:

rstoyanchev · 2021-01-28T17:06:20Z

Thanks for the detailed report.

We're currently not handling AbstractListenerWriteFlushProcessor#flushingFailed for Servlet containers because the expectation is the Servlet container will call our ServletServerHttpResponse$ResponseAsyncListener#onError which in turn will cancel the writing publisher and send an onError for the result of handling. However this is not happening in this case probably because in this case we're already inside a container thread (via onWritePossible) and Tomcat expects us to propagate the error which would then be handled in CoyoteAdapter.asyncDispatch by calling the AsyncListener.

I've made a change in flushingFailed to cancel the write publisher and send an onError downstream. The sample no longer produces those "Response timeout" messages but if you can verify in your own code to confirm that would be great!

danielra · 2021-01-31T19:42:59Z

Thanks for the explanation! I ran the demo scenario again, and I no longer see response timeouts associated with the flushingFailed case - but I have noticed a new timeout case. The trace logs look like this:

14:25:08,314 INFO c.e.d.f.DemoWebFilter [http-nio-8080-exec-14] [47665100] Incoming request observed.
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] UNSUBSCRIBED -> SUBSCRIBING
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] 9223372036854775807 requested
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] SUBSCRIBING -> DEMAND
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] onDataAvailable
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] DEMAND -> READING
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] Read 8092 bytes
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] Publishing data read
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] Read 8192 bytes
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] Publishing data read
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] Read 4067 bytes
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] Publishing data read
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] No more data to read
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] READING -> DEMAND
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] onAllDataRead
14:25:08,314 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-14] [47665100] DEMAND -> COMPLETED
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-1] [47665100] UNSUBSCRIBED -> REQUESTED
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-1] [47665100] Received onNext publisher
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-1] [47665100] REQUESTED -> RECEIVED
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] UNSUBSCRIBED -> REQUESTED
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] Item to write
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] REQUESTED -> RECEIVED
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] onWritePossible
14:25:11,256 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] RECEIVED -> WRITING
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] Wrote 468993 of 468993 bytes
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] WRITING -> REQUESTED
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] No more items to write
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] REQUESTED -> RECEIVED
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] isWritePossible: false
14:25:11,262 TRACE _.s.h.s.r.WriteResultPublisher [parallel-1] [47665100] UNSUBSCRIBED subscribe: org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$WriteResultSubscriber@768a373c
14:25:11,262 TRACE _.s.h.s.r.WriteResultPublisher [parallel-1] [47665100] SUBSCRIBING request: 9223372036854775807
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-1] [47665100] Received onComplete
14:25:11,262 TRACE _.s.h.s.r.WriteResultPublisher [parallel-1] [47665100] UNSUBSCRIBED subscribe: reactor.core.publisher.MonoNext$NextSubscriber@2455b45e
14:25:11,262 TRACE _.s.h.s.r.WriteResultPublisher [parallel-1] [47665100] SUBSCRIBING request: 9223372036854775807
14:25:19,015 TRACE _.s.h.s.r.AbstractListenerReadPublisher [parallel-4] [47665100] Cancellation
14:25:19,015 TRACE _.s.h.s.r.WriteResultPublisher [parallel-4] [47665100] SUBSCRIBED cancel
14:25:19,015 ERROR c.e.d.f.DemoWebFilter [parallel-4] [47665100] Response timeout after 10701 milliseconds for POST request with uri 'http://localhost:8080/post'. Response status code was already committed: '200 OK'.
14:25:19,015 TRACE o.s.h.s.r.ServletHttpHandlerAdapter [parallel-4] [47665100] Handling completed
14:25:19,040 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-95] [47665100] onAllDataRead
14:25:19,040 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-95] [47665100] Received request to cancel
14:25:19,040 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-95] [47665100] Received onComplete
14:25:19,040 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-95] [47665100] Cancellation
14:25:19,040 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-95] [47665100] No more items to write

In this case, what I note is that this line occurs:

14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-1] [47665100] Received onComplete

Aparently while in the RECEIVED state, indicating that processor.subscriberCompleted would be set to true (here: https://github.com/spring-projects/spring-framework/blob/v5.3.3/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java#L306 ) but then there doesn't seem to be an expected writeComplete call to trigger the call to handleSubscriberCompleted that would occur here: https://github.com/spring-projects/spring-framework/blob/v5.3.3/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java#L295-L296 . I haven't had time to dig in further yet. Please let me know if you think opening a new issue is merited, since this is no longer the original flush failure case.

rstoyanchev · 2021-02-02T08:29:27Z

Thanks for testing. I'll have another look.

rstoyanchev · 2021-02-10T15:09:33Z

but then there doesn't seem to be an expected writeComplete call to trigger the call to handleSubscriberCompleted

Indeed and that call should be made when the "current" AbstractListenerWriteProcessor notifies completion via publishComplete() to its WriteResultPublisher.

From the output of the current write processor it looks like it writes its single item and requests another, then receives onComplete and switches back to RECEIVED but gets "false" from ServletOutputStream#isReady():

14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] No more items to write
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] REQUESTED -> RECEIVED
14:25:11,262 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-1] [47665100] isWritePossible: false

So at this point it's waiting on a WriterListener#onWritePossible from the container which would make it advance to COMPLETED but that doesn't happen before the timeout.

Any tips on how to reproduce this one?

danielra · 2021-02-11T01:00:33Z

I seem to be able to reliably get some reproductions of this on my machine when running the steps described in my original report on this issue. With the one difference being that I have an the following includeBuild clause in my settings.gradle of the demo project:

includeBuild '/home/daniel/webflux_race_investigation/spring-framework/'

Which is directory with a local clone of spring-framework checked out at tag v5.3.3 with this diff applied representing your prior fix (and an additional log line):

git diff HEAD
diff --git a/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java b/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java
index 81c8098d32..d124192c7a 100644
--- a/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java
+++ b/spring-web/src/main/java/org/springframework/http/server/reactive/AbstractListenerWriteFlushProcessor.java
@@ -192,6 +192,11 @@ public abstract class AbstractListenerWriteFlushProcessor<T> implements Processo
         * <p>Defaults to no-op.
         */
        protected void flushingFailed(Throwable t) {
+               if (rsWriteFlushLogger.isTraceEnabled()) {
+                       rsWriteFlushLogger.trace(this.logPrefix + this.state + " flushingFailed", t);
+               }
+               cancel();
+               onError(t);
        }

Assuming that it is not reproducing as easily for you, is there anywhere in particular that you think it would be interesting for me to add additional trace logging to gather more information? I should have a bit of time tomorrow to look at this again.

rstoyanchev · 2021-02-11T12:47:18Z

After the last fix I wasn't seeing timeouts any more and in your latest log I saw a 3 second delay for the handling, so I thought there might be some different instructions.

Today I could get several timeouts waiting on a container callback, sometimes its the "current processor" waiting for onWritePossible and other times it makes it to the final flush and hangs there:

2021-02-11 11:42:41,960 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-187] [6970d84f] RECEIVED writeComplete
2021-02-11 11:42:41,960 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-187] [6970d84f] Flush attempt
2021-02-11 11:42:50,232 TRACE _.s.h.s.r.AbstractListenerReadPublisher [parallel-3] [6970d84f] Cancellation

If the flush had completed we should see RECEIVED -> REQUESTED immediately after, or a cancellation/error if the flush failed but we don't so I think it is stuck trying to flush.

It occurred to me it might be related to the load, so I increased the timeout to 30 seconds but still got the timeouts. The other thing is that ThreadLocalRandom can be CPU intensive and block which may exhaust the container threads which runs with a small number of threads for non-blocking I/O. So I tried removing those with fixed numbers:

// final int delayMillis = ThreadLocalRandom.current().nextInt(10) + latencyBase;
// final int intListSize = ThreadLocalRandom.current().nextInt(MAX_INT_LIST_SIZE);
final int delayMillis = 10 + latencyBase;
final int intListSize = MAX_INT_LIST_SIZE;

Now I can no longer reproduce it.

See gh-26434

danielra · 2021-02-13T21:01:14Z

Yeah, I too was starting to wonder about the possibility of just unlucky scheduling causing occasional highly latent responses under load. However, I've done a bunch more experimenting and I do still think there is some issue in play here. I similarly hard-coded the delay and response size (though the intent of the -Djava.security.egd=file:/dev/urandom option in the demo start script was to avoid the random generation potentially blocking), increased the response timeout to 60 seconds, and additionally added a 10 second sleep between the wrk runs to allow CPU to drop back to near zero for a period (intending to give plenty of opportunity for catch up) - and I still am able to reproduce the response timeouts. The repros are a bit less frequent without the random timing/sizing though. For example:

12:25:25,726 INFO c.e.d.f.DemoWebFilter [http-nio-8080-exec-111] [5ac5ec14] Incoming request observed.
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] UNSUBSCRIBED -> SUBSCRIBING                                                                                        
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] 9223372036854775807 requested                                                                                      
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] SUBSCRIBING -> DEMAND                                                                                              
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] onDataAvailable                                                                                                    
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] DEMAND -> READING                                                                                                  
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] Read 8092 bytes                                                                                                    
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] Publishing data read                                                                                               
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] Read 8192 bytes                                                                                                    
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] Publishing data read                                                                                               
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] Read 4067 bytes                                                                                                    
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] Publishing data read                                                                                               
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] No more data to read                                                                                               
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] READING -> DEMAND                                                                                                  
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] onAllDataRead                                                                                                      
12:25:25,726 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-111] [5ac5ec14] DEMAND -> COMPLETED                                                                                                
12:25:25,726 INFO c.e.d.c.DemoController [http-nio-8080-exec-111] [5ac5ec14] Controller called. Delay: 6. Size: 30519                                                                                             
12:25:25,795 INFO c.e.d.c.DemoController [parallel-2] [5ac5ec14] Mono returned from Controller doOnSuccess called.                                                                                                
12:25:25,795 INFO c.e.d.c.DemoController [parallel-2] [5ac5ec14] Mono returned from Controller doOnTerminate called.                                                                                              
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [5ac5ec14] UNSUBSCRIBED -> REQUESTED                                                                                                
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [5ac5ec14] Received onNext publisher                                                                                                
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [5ac5ec14] REQUESTED -> RECEIVED                                                                                                    
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [5ac5ec14] UNSUBSCRIBED -> REQUESTED                                                                                                     
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [5ac5ec14] Item to write
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [5ac5ec14] REQUESTED -> RECEIVED                                                                                                         
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [5ac5ec14] isWritePossible: false                                                                                                        
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [parallel-2] [5ac5ec14] No more items to write                                                                                                        
12:25:25,796 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [5ac5ec14] UNSUBSCRIBED subscribe: org.springframework.http.server.reactive.AbstractListenerWriteFlushProcessor$State$WriteResultSubscriber@7c2c4f9b
12:25:25,796 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [5ac5ec14] SUBSCRIBING request: 9223372036854775807                                                                                                
12:25:25,796 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [parallel-2] [5ac5ec14] Received onComplete                                                                                                      
12:25:25,796 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [5ac5ec14] UNSUBSCRIBED subscribe: reactor.core.publisher.MonoNext$NextSubscriber@79b742d0                                                         
12:25:25,796 TRACE _.s.h.s.r.WriteResultPublisher [parallel-2] [5ac5ec14] SUBSCRIBING request: 9223372036854775807                                                                                                
12:26:25,730 TRACE _.s.h.s.r.AbstractListenerReadPublisher [parallel-1] [5ac5ec14] Cancellation
12:26:25,730 TRACE _.s.h.s.r.WriteResultPublisher [parallel-1] [5ac5ec14] SUBSCRIBED cancel
12:26:25,731 ERROR c.e.d.f.DemoWebFilter [parallel-1] [5ac5ec14] Response timeout after 60004 milliseconds for POST request with uri 'http://localhost:8080/post'. Response status code was already committed: '200 OK'.
12:26:25,731 TRACE o.s.h.s.r.ServletHttpHandlerAdapter [parallel-1] [5ac5ec14] Handling completed
12:26:25,731 TRACE _.s.h.s.r.AbstractListenerReadPublisher [http-nio-8080-exec-140] [5ac5ec14] onAllDataRead                                                                                                      
12:26:25,731 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-140] [5ac5ec14] Received request to cancel                                                                                   
12:26:25,731 TRACE _.s.h.s.r.AbstractListenerWriteFlushProcessor [http-nio-8080-exec-140] [5ac5ec14] Received onComplete                                                                                          
12:26:25,731 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-140] [5ac5ec14] Cancellation                                                                                                      
12:26:25,731 TRACE _.s.h.s.r.AbstractListenerWriteProcessor [http-nio-8080-exec-140] [5ac5ec14] No more items to write

This is trace was obtained running the demo as in this feature branch (but also specifying the local spring-framework project with your original fix applied in settings.gradle as mentioned previously): https://github.com/danielra/webflux_race_investigation/tree/static-delay-and-response-size

With the wrk load and sleeps run as follows:

for i in {1..100}; do   ./wrk -t 2 -c 1000 -d 10s --latency --timeout 9s --script ./myPost.lua http://localhost:8080/post; sleep 10; done

I did see some errors like the following in the logs as well, but they are apparently not time correlated with the response timeouts - so I don't think they are related:

12:25:15,592 ERROR o.a.c.h.Http11NioProtocol [http-nio-8080-exec-142] Error reading request, ignored
java.lang.IllegalStateException: Calling [asyncPostProcess()] is not valid for a request with Async state [ERROR]
12:24:35,284 ERROR o.a.c.h.Http11NioProtocol [http-nio-8080-exec-115] Error reading request, ignored
java.lang.IllegalStateException: Calling [asyncPostProcess()] is not valid for a request with Async state [ERROR]

rstoyanchev · 2021-02-15T18:41:12Z

I've changed to the exact same settings from your latest commit. I can see timeouts but all are waiting on a write callback from Tomcat or a flush that's stuck.

wrk is called with --timeout 9s still so some connections are probably getting closed from the client side. Furthermore, running wrk for 10 second intervals probably means more connections get closed when wrk closes.

If I remove the --timeout parameter from the wrk command, use -d 5m instead of looping for 10 seconds on each iteration, I no longer see timeouts. If remove only the --timeout and increase the server timeout to 120 seconds, I see timeouts and virtually all of them are stuck trying to flush.

So perhaps connections getting closed from the client side lead under load. From what I can see though this is in Tomcat and I'm not sure what we can do other than set a timeout perhaps.

@markt-asf, I'm wondering if you have any comments on this one? Basically WebFlux on Tomcat as in #26407, under load with some connections getting closed from the client side due to a timeout. We're seeing that either onWritePossible never comes or a flush never completes.

markt-asf · 2021-02-22T10:18:32Z

Possibly an instance of https://bz.apache.org/bugzilla/show_bug.cgi?id=65001 ? Updating to Spring Boot 2.4.3 should pick up Tomcat 9.0.43 and the fix for that issue. Better would be to pick up the latest Tomcat 9.0.x from the ASF snapshot repo and test that as there is a potential fix for #26407 in the current 9.0.x dev code.

rstoyanchev · 2021-02-22T12:42:53Z

I have tried with 9.0-SNAPSHOT and it didn't help. It's something else probably.

The case here is also related to clients disconnecting. When the wrk client is set to run in a loop for 10 seconds in each iteration, it's relative easy to demonstrate. If it's set to run continuously, e.g. for 5 minutes (and without any --timeout) then it doesn't happen. In all cases I've seen, the input is read fully and after that, either outputStream.isReady() returns false after the first (and final) write and before the final flush, or it hangs in the final flush with no further log messages that should otherwise appear immediately after we come out from the flush.

markt-asf · 2021-02-22T12:45:21Z

Do you have a WAR file I can use to reproduce this on Tomcat?

rstoyanchev · 2021-02-22T14:43:09Z

This is the sample. You can add id 'war' around here and then ./gradlew bootWar will create a war.

For the load:

for i in {1..100}; do   ${pathToWrk}/wrk -t 2 -c 1000 -d 10s --latency --timeout 9s --script ./myPost.lua http://localhost:8080/post; done

and the myPost.lua script:

wrk.method = "POST"
wrk.body   = "[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]"
wrk.headers["Content-Type"] = "application/json"

The use of ThreadLocalRandom in the controller seems to make matters worse. Arguably code that can block shouldn't be used in non-blocking handling. Here is a branch where this is replaced with non-random, fixed values.

markt-asf · 2021-02-22T19:07:27Z

All I get if I do that is 404 responses. Looking in JMX it does not appear that any application Servlets or Filters are configured.

rstoyanchev · 2021-02-23T17:04:09Z

Sorry, my instructions were incomplete. I don't often run Boot apps as a war and I'm having trouble with it. Let me ask, would it work for you to run it in embedded mode, via ./gradlew bootRun or within the IDE just starting DemoApplication as a regular Java class? Or do you need it to be a war archive?

markt-asf · 2021-02-23T17:26:40Z

I need to be able to run the test case using my local Tomcat build and to be able to connect my IDE to perform remote debugging. Running as a Java class in the IDE should allow me to do both of those (with a few tweaks to the project configuration) if generating a WAR is proving tricky.

bclozel · 2021-02-23T18:03:39Z

@rstoyanchev Sorry for misleading you earlier today - Spring Boot doesn't support WAR packaging for WebFlux applications. So this is not a typo or misconfiguration on your side, this is just something we don't support.

markt-asf · 2021-02-23T19:45:17Z

I have this running locally now. Not much luck reproducing any errors though. Some idea of how often I should expect to see errors for a given load command would be helpful.

danielra · 2021-02-23T19:53:30Z

FWIW, for me on my machine I was typically able to reproduce the issue multiple times within a minute or so on the main branch of the demo project which has random delays / response sizes. With the static values in the static-delay-and-response-size branch I only saw one repro over my entire run of:

for i in {1..100}; do   ./wrk -t 2 -c 1000 -d 10s --latency --timeout 9s --script ./myPost.lua http://localhost:8080/post; sleep 10; done

for which I posted the trace above.

thomaskiesl · 2022-01-02T12:44:39Z

In NioEndpoint a condition was added. The following IOException is thrown to detect this issue:

2021-12-25 08:38:31,250 ERROR | https-jsse-nio-18443-Acceptor |
org.apache.tomcat.util.net.Acceptor                     | Socket accept failed java.io.IOException: Duplicate accept detected. This is a known OS bug. Please consider reporting that you are affected:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1924298
   at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:545)
   at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:78)
   at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:129)
   at java.base/java.lang.Thread.run(Thread.java:833)

My issue respectively worries are the following:

I receive this message ~20 times a day on my productive environment (CentOS 7, Java 17). Since the update one week ago I receive this issue. Currently there is only less workload on the system but I am worried that next week (after christmas) this bug becomes a big issue for me.

I do not understand excatly the problem of it. I have a Vaadin web application (Version 21) and the following questions:

When exactly does this error occure?
Is an error shown in the frontend (e.g. something like lost connection, manual refresh of the browser necessary, ...?) or is it only "under the hood" of my spring application?
I assume the bug occured already before your fix in November. What happend in case of this situation, before you added this check?

Thanks in advance for the information!

markt-asf · 2022-01-02T13:07:17Z

The error occurs when the operating system reports the same client connection being accepted twice. This typically happens under load but, since the root cause is not known, we have to assume it could happen with any accept.

With the new error handling in place, the front-end should not see any side-effects of this. The first accept is handled as normal and the second is simply ignored. This avoids the sort of problems described previously in this issue.

Evidence collected so far indicates that the bug dates back at least 3 years and has probably been around longer than that.

If you see this issue on Ubuntu then, as the log message indicates, please update the Ubuntu bug. The more people that report they are affected, the more likely the bug is to get some attention.

If you see this issue on a different OS then:

Build and run the pure C test case attached to the Java bug reoprt
If you see the error, raise a bug for your operating system
It would be helpful if you provided a link for any OS bug created

thomaskiesl · 2022-01-02T13:29:21Z

Thanks for the explanation!

Bug report for CentOS: https://bugs.centos.org/view.php?id=18383

ettavolt · 2022-01-27T20:10:26Z

This is probably not the best place to put my ideas, but I'm hesitant to reg on JDK bug system and LaunchPad, sorry. ☺

As I see it, the wrk test is trying to hold -c 1000 HTTP over TCP connections from each of the -t 2 threads. Some people run it twice or thrice in parallel, approaching 4000 or 6000 open TCP connections on localhost.

Now we have net.ipv4.ip_local_port_range which is, say, on my [email protected] is 32768 60999, i.e. 28232 ports available for client connections.

I don't see it as surprising that when the kernel has 6000 of 28232 options consumed it sometimes decides to reuse a recently freed port number. If it does choose a free port number uniformly random, phenomenon should occur once per 22232 requests made. Which isn't too rare.

P.S. @markt-asf, I came to this thread after my colleague has found that when configured to listen on UDS, Tomcat rejects all but the first curl request via it, because UDSA#equals compares path field, which is, obviously, never changes.

Shrinking the net.ipv4.ip_local_port_range (probably in a container to avoid browser/services noise) might be a good way to simplify testing Tomcat (or JDK) in a condition when port number is reused for the subsequent request.

markt-asf · 2022-01-27T20:17:17Z

That might be a way to trigger a false positive but it misses the point. When the real issue occurs you get:

the two sockets returned from serversocket.accept() have the same client IP/port
the two sockets are returned from consecutive calls to accept()
the two sockets have different (consecutive) file descriptors
you can write to both sockets successfully

This isn't a clash with a recently freed port.

illidan80 · 2022-02-01T09:15:23Z

Getting this error constantly on Alpine, started right after we migrated from Spring 2.3.7 to 2.6.2

MattWeiler · 2022-02-15T13:52:28Z

We found that this issue occurred for our projects when we upgraded from Spring Boot version 1.5.9-RELEASE to 2.6.2.

Does anyone know what the most recent version of Spring Boot is that is not affected by this?

nesrinyesilkaya · 2022-02-22T09:35:36Z

The latest spring boot version(2.6.3) is still using tomcat 9.0.56, if you upgrade the tomcat version externally, the problem seems to be fixed.
In your build.gradle file, add the following line.
ext['tomcat.version'] = '9.0.58'

Update: tomcat upgrade is not working, after a while, the same error was occurred again

MattWeiler · 2022-02-23T14:43:33Z

I just tried Spring Boot 2.3.7.RELEASE and the same issue arose during load testing.
It's becoming a hard sell to business to let us try a new build with the latest version of tomcat.

Does anyone actually know 100% what the issue is?
As it stands, it seems that each team (apache/tomcat, Spring, Linux/Ubuntu) is passing the buck.

But if the above comment by nesrinyesilkaya is correct and upgrading to the latest version of Tomcat fixes the issue, it doesn't sound like an OS issue (unless the latest Tomcat code has made changes to not use certain OS APIs).

markt-asf · 2022-02-23T15:02:21Z

To summarise the current status...

The original issue had multiple root causes. Bugs were fixed in Spring and Tomcat. There was also an OS (Linux) bug identified that remains unfixed.

If you see this issue reported in the Tomcat logs then the best thing you can do is:

Use the test case described in the Ubuntu bug report to confirm that you are seeing the Linux bug
Report it via your support channel for the OS and push for a fix

As of Tomcat 10.1.0-M8, 10.0.14, 9.0.56 and 8.5.74 Tomcat will both log a warning if the issue is detected and take steps to ensure that the bug does not impact the running of the application.

The Tomcat team is aware of some scenarios that will falsely trigger the detection of this bug. The current Tomcat release round (10.1.0-M11, 10.0.17, 9.0.59 and 8.5.76) will include improved detection that avoids those false positives. Hence why it is important to confirm you are seeing this issue with the test case attached to the Ubuntu bug report before reporting it to your OS vendor.

It is possible that other issues will have similar symptoms. As usual, test with the latest stable releases of Spring and Tomcat and, if you still see the problem, open an issue and provide a test case that demonstrates it and it will get investigated.

See spring-projectsgh-26434

This commit ensures handling is cancelled in case of onError/Timeout callback from the Servlet container. Separately we detect the same in ServletServerHttpRequest and ServletServerHttpResponse, which signal onError to the read publisher and cancel writing, but if the onError/Timeout arrives after reading is done and before writing has started (e.g. longer handling), then neither will reach handling. See spring-projectsgh-26434, spring-projectsgh-26407

Previously we registered 3 AsyncListener's from the request, from the response, and from the Servlet adapter. After this change, only the Servlet adapter registers a listener and the others are delegated to. This consolidates the handling of AsyncListener events so that it's easier to discover, trace, and enforce the order of handling. See spring-projectsgh-26434

See spring-projectsgh-26434

See gh-spring-projectsgh-26434

rui-zhou-phx · 2022-03-31T02:58:44Z

with tomcat 9.0.58 I still saw this error

btpnlsl · 2022-04-22T17:12:55Z

I believe the OS bug is fixed starting in kernel 5.10 (starting in 5.10-rc6). The bug repros in kernel 5.10-rc4 but not 5.10-rc6.

I'm fairly sure that the bug fix comes from this commit which describes a race condition in the Linux TCP stack where two duplicate sockets will be created in the established connections hashtable.

brendsanchez · 2022-11-28T17:09:34Z

with spring 2.5.8 and tomcat 9.0.59 works to me

llody55 · 2024-11-22T01:56:53Z

I also encountered this problem in ubantu22.04

java.io.IOException: Duplicate accept detected. This is a known OS bug. Please consider reporting that you are affected: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1924298
	at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:545) ~[tomcat-embed-core-9.0.56.jar!/:na]
	at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:78) ~[tomcat-embed-core-9.0.56.jar!/:na]
	at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:129) ~[tomcat-embed-core-9.0.56.jar!/:na]
	at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

root@k8s-master1:~# uname -a
Linux k8s-master1 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:56:10 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

root@k8s-master1:~# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jan 23, 2021

rstoyanchev self-assigned this Jan 28, 2021

rstoyanchev added in: web Issues in web modules (web, webmvc, webflux, websocket) type: bug A general bug labels Jan 28, 2021

rstoyanchev added this to the 5.3.4 milestone Jan 28, 2021

rstoyanchev closed this as completed in 5e5d8e4 Jan 28, 2021

rstoyanchev mentioned this issue Jan 29, 2021

InMemoryWebSessionStore indirectly causing infinite loop inside tomcat-native OpenSSL under load #26407

Closed

rstoyanchev reopened this Feb 2, 2021

rstoyanchev removed the status: waiting-for-triage An issue we've not yet triaged or decided on label Feb 2, 2021

rstoyanchev added a commit that referenced this issue Feb 11, 2021

Polishing

1c6bab2

See gh-26434

rstoyanchev modified the milestones: 5.3.4, 5.3.5 Feb 15, 2021

bclozel mentioned this issue Feb 1, 2022

Tomcat IOException: Duplicate accept detected. This is a known OS bug. Please consider reporting that you are affected: spring-projects/spring-boot#29609

Closed

lxbzmy pushed a commit to lxbzmy/spring-framework that referenced this issue Mar 26, 2022

Polishing

b696f82

See spring-projectsgh-26434

lxbzmy pushed a commit to lxbzmy/spring-framework that referenced this issue Mar 26, 2022

Improve logging for Servlet / Reactive Streams adapters

c8396cb

See spring-projectsgh-26434

lxbzmy pushed a commit to lxbzmy/spring-framework that referenced this issue Mar 26, 2022

Consistent handling on Servlet non-blocking error callbacks

fe4de9c

See spring-projectsgh-26434

lxbzmy pushed a commit to lxbzmy/spring-framework that referenced this issue Mar 26, 2022

Fix log message

7eeb4b0

See gh-spring-projectsgh-26434

ChrisKujawa mentioned this issue Mar 29, 2022

IOException: Duplicate accept detected. camunda/camunda#9016

Closed

bclozel mentioned this issue Sep 5, 2022

SEVERE [https-jsse-nio-8443-Acceptor] org.apache.tomcat.util.net.Acceptor.run Socket accept failed #29073

Closed

This comment was marked as duplicate.

Sign in to view

gouravkrosx mentioned this issue Dec 21, 2023

[bug]: unable to complete subsequent requests after first request in some java spring boot applications. keploy/keploy#1233

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response writing fails to complete with WebFlux on Tomcat #26434

Response writing fails to complete with WebFlux on Tomcat #26434

danielra commented Jan 23, 2021

rstoyanchev commented Jan 28, 2021

danielra commented Jan 31, 2021 •

edited

Loading

rstoyanchev commented Feb 2, 2021

rstoyanchev commented Feb 10, 2021 •

edited

Loading

danielra commented Feb 11, 2021

rstoyanchev commented Feb 11, 2021 •

edited

Loading

danielra commented Feb 13, 2021

rstoyanchev commented Feb 15, 2021 •

edited

Loading

markt-asf commented Feb 22, 2021

rstoyanchev commented Feb 22, 2021 •

edited

Loading

markt-asf commented Feb 22, 2021

rstoyanchev commented Feb 22, 2021

markt-asf commented Feb 22, 2021

rstoyanchev commented Feb 23, 2021

markt-asf commented Feb 23, 2021

bclozel commented Feb 23, 2021

markt-asf commented Feb 23, 2021

danielra commented Feb 23, 2021

thomaskiesl commented Jan 2, 2022

markt-asf commented Jan 2, 2022

thomaskiesl commented Jan 2, 2022

ettavolt commented Jan 27, 2022

markt-asf commented Jan 27, 2022

illidan80 commented Feb 1, 2022

MattWeiler commented Feb 15, 2022

nesrinyesilkaya commented Feb 22, 2022 •

edited

Loading

MattWeiler commented Feb 23, 2022

markt-asf commented Feb 23, 2022

rui-zhou-phx commented Mar 31, 2022

btpnlsl commented Apr 22, 2022

This comment was marked as duplicate.

brendsanchez commented Nov 28, 2022

llody55 commented Nov 22, 2024 •

edited

Loading

Response writing fails to complete with WebFlux on Tomcat #26434

Response writing fails to complete with WebFlux on Tomcat #26434

Comments

danielra commented Jan 23, 2021

rstoyanchev commented Jan 28, 2021

danielra commented Jan 31, 2021 • edited Loading

rstoyanchev commented Feb 2, 2021

rstoyanchev commented Feb 10, 2021 • edited Loading

danielra commented Feb 11, 2021

rstoyanchev commented Feb 11, 2021 • edited Loading

danielra commented Feb 13, 2021

rstoyanchev commented Feb 15, 2021 • edited Loading

markt-asf commented Feb 22, 2021

rstoyanchev commented Feb 22, 2021 • edited Loading

markt-asf commented Feb 22, 2021

rstoyanchev commented Feb 22, 2021

markt-asf commented Feb 22, 2021

rstoyanchev commented Feb 23, 2021

markt-asf commented Feb 23, 2021

bclozel commented Feb 23, 2021

markt-asf commented Feb 23, 2021

danielra commented Feb 23, 2021

thomaskiesl commented Jan 2, 2022

markt-asf commented Jan 2, 2022

thomaskiesl commented Jan 2, 2022

ettavolt commented Jan 27, 2022

markt-asf commented Jan 27, 2022

illidan80 commented Feb 1, 2022

MattWeiler commented Feb 15, 2022

nesrinyesilkaya commented Feb 22, 2022 • edited Loading

MattWeiler commented Feb 23, 2022

markt-asf commented Feb 23, 2022

rui-zhou-phx commented Mar 31, 2022

btpnlsl commented Apr 22, 2022

This comment was marked as duplicate.

brendsanchez commented Nov 28, 2022

llody55 commented Nov 22, 2024 • edited Loading

danielra commented Jan 31, 2021 •

edited

Loading

rstoyanchev commented Feb 10, 2021 •

edited

Loading

rstoyanchev commented Feb 11, 2021 •

edited

Loading

rstoyanchev commented Feb 15, 2021 •

edited

Loading

rstoyanchev commented Feb 22, 2021 •

edited

Loading

nesrinyesilkaya commented Feb 22, 2022 •

edited

Loading

llody55 commented Nov 22, 2024 •

edited

Loading