Idle timeout occured sometimes on HTTP/2 client with `InputStreamResponseListener` #8558

clardeur · 2022-09-08T09:50:51Z

Jetty version(s)
11.0.8

Java version/vendor
openjdk version "11.0.16" 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu118.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu118.04, mixed mode, sharing)

OS type/version
Linux 5.4.0-125-generic 141~18.04.1-Ubuntu x86_64 x86_64 x86_64 GNU/Linux

Description
We have switched in production the response listener of the HTTP/2 client, before we used the FutureResponseListener and now the InputStreamResponseListener, since this change we are observing regularly failures with timeouts and it happens randomly.

This is very similar in symptoms to the issue #7192

How to reproduce?
The problem is difficult to reproduce due to its randomness. As described in the comment of the similar issue (#7192), when we increase the log level to DEBUG, the problem is even less reproducible.

I still managed to reproduce the problem thanks to a test in the jetty project that triggers the timeout almost every time.

Test case - Code

public class HttpClientTransportOverHTTP2Test extends AbstractTest
{

    private static final Logger LOG = LoggerFactory.getLogger(HttpClientTransportOverHTTP2Test.class);

    @Test
    public void testInputStreamResponseListener() throws Exception
    {
        var bytes = 100_000;
        start(new ServerSessionListener.Adapter()
        {
            @Override
            public Stream.Listener onNewStream(Stream stream, HeadersFrame frame)
            {
                int streamId = stream.getId();
                MetaData.Response response = new MetaData.Response(HttpVersion.HTTP_2, HttpStatus.OK_200, HttpFields.EMPTY);
                HeadersFrame responseFrame = new HeadersFrame(streamId, response, null, false);
                Callback.Completable callback = new Callback.Completable();
                stream.headers(responseFrame, callback);
                callback.thenRun(() -> stream.data(new DataFrame(streamId, ByteBuffer.wrap(new byte[bytes]), true), Callback.NOOP));
                return null;
            }
        });

        var nbRequests = 100_000;
        var progress = new AtomicInteger(0);
        IntStream.range(0, nbRequests).forEach(i -> {
            try {
                if (progress.incrementAndGet() % 1000 == 0) {
                    LOG.info("progress {}/{}", progress.get(), nbRequests);
                }
                InputStreamResponseListener listener = new InputStreamResponseListener();
                client.newRequest("localhost", connector.getLocalPort()).headers(httpFields -> httpFields.put("X-Request-Id", Integer.toString(i))).send(listener);
                Response response = listener.get(60, TimeUnit.SECONDS);
                assertEquals(HttpStatus.OK_200, response.getStatus());
                assertEquals(bytes, listener.getInputStream().readAllBytes().length);
            } catch (Exception e) {
                LOG.error("request {} failed: {}", i, e.getMessage());
                fail(e);
            }
        });
    }

}

Test case - Failure Stacktrace

org.opentest4j.AssertionFailedError
	at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:47)
	at org.junit.jupiter.api.Assertions.fail(Assertions.java:157)
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2Test.lambda$testInputStreamResponseListener$1(HttpClientTransportOverHTTP2Test.java:77)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:593)
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2Test.testInputStreamResponseListener(HttpClientTransportOverHTTP2Test.java:65)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
	at org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
	at org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms
	at org.eclipse.jetty.client/org.eclipse.jetty.client.util.InputStreamResponseListener$Input.toIOException(InputStreamResponseListener.java:342)
	at org.eclipse.jetty.client/org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:310)
	at java.base/java.io.InputStream.readNBytes(InputStream.java:409)
	at java.base/java.io.InputStream.readAllBytes(InputStream.java:346)
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2Test.lambda$testInputStreamResponseListener$1(HttpClientTransportOverHTTP2Test.java:74)
	... 72 more
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HTTPSessionListenerPromise.onIdleTimeout(HTTPSessionListenerPromise.java:103)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Session.notifyIdleTimeout(HTTP2Session.java:1165)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Session$StreamsState.onIdleTimeout(HTTP2Session.java:1916)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Session.onIdleTimeout(HTTP2Session.java:1006)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Connection.onIdleExpired(HTTP2Connection.java:180)
	at org.eclipse.jetty.io/org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
	at org.eclipse.jetty.io/org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
	at org.eclipse.jetty.io/org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:108)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)

Test case - Output

Full output

jetty_test_case.txt.tar.gz

Interesting snippets

The failure happened after that the ContentNotifier will be marked as inactive and stalled in this code path :

            // Process only if there is demand.
            try (AutoLock l = lock.lock())
            {
                if (!resume && demand() <= 0)
                {
                    if (LOG.isDebugEnabled())
                        LOG.debug("Stalling processing, content available but no demand");
                    active = false;
                    stalled = true;
                    return;
                }
            }

After that the content notifier does not resume the processing and therefore does not notify the InputStreamResponseListener, hence the final timeout.

2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-27: [null][ContentNotifier@405c852b] Resuming(false) processing(true) of content [stalled=false]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-27: [ContentNotifier@405c852b] Stalling processing, content available but no demand
2022-09-08 11:33:03.461:INFO :oejc.HttpReceiver:client-29: [HttpReceiverOverHTTP2@52c3b6de(rsp=TRANSIENT,failure=null)] Response demand=1/1, resume=false [stalled=false]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@503c7006[DataFrame@2960b028#8493{length:16375,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@54b2c4c6[DataFrame@79facf75#8493{length:9,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@4af53c0c[DataFrame@6638348a#8493{length:16366,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@246e487f[DataFrame@b40aae0#8493{length:18,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@6836f2d8[DataFrame@4f8d1730#8493{length:16357,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@19f6930e[DataFrame@176f3d7a#8493{length:27,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@4890ad7d[DataFrame@6f0307c8#8493{length:16348,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@424097c5[DataFrame@2caae61a#8493{length:36,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@619c4db7[DataFrame@16563b0b#8493{length:16339,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@506e68bb[DataFrame@2b0fce9a#8493{length:45,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@287ddfb9[DataFrame@fd437de#8493{length:16330,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@2cdcad2f[DataFrame@7972cd30#8493{length:54,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@3f3445b0[DataFrame@11f0ef4f#8493{length:1696,end=true}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:08.469:INFO :oejc.HttpReceiver:HttpClient@13e344d-scheduler-1: cleanup HttpReceiverOverHTTP2@52c3b6de(rsp=FAILURE,failure=java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms)
2022-09-08 11:33:08.469:ERROR:oejhch.HttpClientTransportOverHTTP2Test:main: request 4246 failed: java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms

After that my understanding of the code and the design of the components is not sufficient to go further in the analysis. I don't know if I'm completely on the wrong track or not, WDYT ?

The text was updated successfully, but these errors were encountered:

sbordet · 2022-09-12T09:39:17Z

@clardeur thanks for the detailed report!

I can definitely see a race condition, we're working to analyze it and fix it.

…istener. The issue was that HttpReceiverOverHTTP2.ContentNotifier.offer() was racy, as a network thread could have offered a DATA frame, but not yet called process() -- yet an application thread could have stolen the DATA frame completed the response and started another response, causing the network thread to interact with the wrong response. The implementation has been changed so that HttpReceiverOverHTTP2.ContentNotifier does not have a queue anymore and it demands DATA frames to the Stream only when the application demands more -- a simpler model that just forwards the demand. Signed-off-by: Simone Bordet <[email protected]>

sbordet · 2022-09-14T21:23:28Z

@clardeur would you be able to try branch jetty-10.0.x-8858-idle-timeout-http2-inputstreamresponselistener and see if it solves the problem for you?

clardeur · 2022-09-15T07:58:15Z

@clardeur would you be able to try branch jetty-10.0.x-8858-idle-timeout-http2-inputstreamresponselistener and see if it solves the problem for you?

Hi @sbordet, could you backport the commit on the 11.0.x branch ? We have dependencies like websocket-jakarta-server that does not exist on the branch 10.0.x, it's easier to test for us on our production stack with the same major version.

If it's not possible, I can backport the changes myself locally just for testing purpose. But in any case, we need the fix on the 11.0.x version. Thanks for your work.

…istener. Updates after review. Signed-off-by: Simone Bordet <[email protected]>

sbordet · 2022-09-15T08:22:04Z

@clardeur I just merged the changes to Jetty 11 too, so you can build locally branch jetty-11.0.x and test.
Let us know how it goes. Thanks!

…istener. Fixed override of reset(). Signed-off-by: Simone Bordet <[email protected]>

clardeur · 2022-09-15T14:33:16Z

Let us know how it goes.

I have re-run a job that crash every time due to this issue and it goes well 👍

My 2 cents
It could be benefit for the project to add more client/server chaotic "production" load test like the HttpClientLoadTest with some parametrized scenarios that mix different implementations like Listener.Adapter or ConnectionPool to detect race condition or just functionnal regressions. @sbordet WDYT ?

sbordet · 2022-09-15T14:56:21Z

@clardeur we have added your test to our suite, if that's what you meant.

The problem is that we can have a million edge cases, with 7 different transports, with many different ways of doing the same thing, and while we try, we cannot possibly cover all of them.

Your case only happens with HTTP/2, only with InputStreamResponseListener, only with more than 1 request sent iteratively, and only with a very particular thread scheduling 😅

I'm open to suggestions, but the number of permutations (and some of them like thread scheduling can only be stochastic) is typically too large to have a perfect coverage.

clardeur · 2022-09-15T15:56:04Z

@clardeur we have added your test to our suite, if that's what you meant.

IMO my test case is very specific and related to this issue. I'm not sure if it's a good test to add.

Your case only happens with HTTP/2, only with InputStreamResponseListener, only with more than 1 request sent iteratively, and only with a very particular thread scheduling

Hard to find, indeed 😅

The problem is that we can have a million edge cases, with 7 different transports, with many different ways of doing the same thing, and while we try, we cannot possibly cover all of them.

I'm open to suggestions, but the number of permutations (and some of them like thread scheduling can only be stochastic) is typically too large to have a perfect coverage.

Yeah I know, a complex problem that requires time and expertise in testing. Since we have been using Jetty in production for 7 years now, some issues (#7107) (486829) that have been reported previously could have been avoided with some "basic" scenario. It's likely that some issues are more related to the client part of the project which is maybe less used in production compared to the server part.

The first step might be to add a nominal test case for each transport layer and parametrize only one major component at a time to avoid an exponential explosion, like the connection pool, which is a core component in my opinion. And in a second step, adding some "chaos" like randomness, latency, etc.

I know it's far from being a perfect coverage but it's like a safety net that does not require too much effort.

In any case, @sbordet, it's always a pleasure to take time to explore the code, reproduce the problem, understand how it's works to submit a proper issue, open source maintainers deserve it !

dsmiley · 2022-11-04T22:12:09Z

The problem is that we can have a million edge cases, with 7 different transports, with many different ways of doing the same thing, and while we try, we cannot possibly cover all of them.

A "randomized testing" philosophy is intended to address this. Given enough randomized executions, you may well cover all of the scenarios :-). Apache Lucene and Solr embrace this philosophy. We randomize the JDKs, GCs, TimeZones, and within Lucene & Solr, arbitrary choices in implementations or various things specific to a test. Like randomly picking some data to use to test on. The main requirement is an ability to have repeatable randomness so that there is a greater chance of being able to reproduce a problem that is discovered. Lucene & Solr use a testing utility (JUnit runners, etc.) with an apt name for this -- randomizedtesting. It's not all wonderful. Debugging a failure that does not reproduce, despite the seed, is a challenge. Sometimes the outcome is the test needing to be less random or guard against something that can occur based on the randomness -- thus not a real success. But it does find obscure bugs!

clardeur added the Bug For general bugs on Jetty side label Sep 8, 2022

joakime added this to Jetty 10.0.12 / 11.0.12 FROZEN and Jetty 9.4.49 FROZEN Sep 14, 2022

joakime moved this to To do in Jetty 10.0.12 / 11.0.12 FROZEN Sep 14, 2022

joakime moved this to To do in Jetty 9.4.49 FROZEN Sep 14, 2022

joakime assigned sbordet Sep 14, 2022

joakime moved this from To do to In progress in Jetty 10.0.12 / 11.0.12 FROZEN Sep 14, 2022

joakime mentioned this issue Sep 14, 2022

InputStreamResponseListener occasionally hangs waiting for response body chunk #7192

Closed

sbordet added a commit that referenced this issue Sep 15, 2022

Fixes #8558 - Idle timeout occurs on HTTP/2 with InputStreamResponseL…

c5c5588

…istener. Updates after review. Signed-off-by: Simone Bordet <[email protected]>

sbordet closed this as completed in dab4fe6 Sep 15, 2022

Repository owner moved this from In progress to Done in Jetty 10.0.12 / 11.0.12 FROZEN Sep 15, 2022

Repository owner moved this from To do to Done in Jetty 9.4.49 FROZEN Sep 15, 2022

sbordet added the Outstanding Bug Report! The bug report was outstanding! label Sep 15, 2022

sbordet added a commit that referenced this issue Sep 15, 2022

Fixes #8558 - Idle timeout occurs on HTTP/2 with InputStreamResponseL…

8d2b182

…istener. Fixed override of reset(). Signed-off-by: Simone Bordet <[email protected]>

sbordet linked a pull request Oct 4, 2022 that will close this issue

Fixes #8558 - Idle timeout occurs on HTTP/2 with InputStreamResponseL… #8585

Merged

sbordet removed this from Jetty 9.4.49 FROZEN Oct 4, 2022

sbordet added this to Jetty 10.0.13 / 11.0.13 - FROZEN Oct 4, 2022

sbordet removed this from Jetty 10.0.12 / 11.0.12 FROZEN Oct 4, 2022

risdenk mentioned this issue Nov 3, 2022

SOLR-15955: Update Jetty dependency to 10 apache/solr#585

Merged

joakime moved this to ✅ Done in Jetty 10.0.13 / 11.0.13 - FROZEN Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idle timeout occured sometimes on HTTP/2 client with `InputStreamResponseListener` #8558

Idle timeout occured sometimes on HTTP/2 client with `InputStreamResponseListener` #8558

clardeur commented Sep 8, 2022

Full output

Interesting snippets

sbordet commented Sep 12, 2022

sbordet commented Sep 14, 2022

clardeur commented Sep 15, 2022

sbordet commented Sep 15, 2022

clardeur commented Sep 15, 2022

sbordet commented Sep 15, 2022

clardeur commented Sep 15, 2022

dsmiley commented Nov 4, 2022

Idle timeout occured sometimes on HTTP/2 client with InputStreamResponseListener #8558

Idle timeout occured sometimes on HTTP/2 client with InputStreamResponseListener #8558

Comments

clardeur commented Sep 8, 2022

Full output

Interesting snippets

sbordet commented Sep 12, 2022

sbordet commented Sep 14, 2022

clardeur commented Sep 15, 2022

sbordet commented Sep 15, 2022

clardeur commented Sep 15, 2022

sbordet commented Sep 15, 2022

clardeur commented Sep 15, 2022

dsmiley commented Nov 4, 2022

Idle timeout occured sometimes on HTTP/2 client with `InputStreamResponseListener` #8558

Idle timeout occured sometimes on HTTP/2 client with `InputStreamResponseListener` #8558