`ClientRequestContext#cancel` cancels the associated request immediately #5800

jrhee17 · 2024-07-05T08:47:33Z

Motivation:

In order to handle #4591, I propose that we first define an API which allows users to cancel a request. Currently, ClientRequestContext#cancel is invoked once CancellationScheduler#start is invoked. I propose to change the behavior of ClientRequestContext#cancel such that the associated request is aborted immediately. Once this API is available, it would be trivial to implement ResponseTimeoutMode by adjusting where to call CancellationScheduler#start. Additionally, users may easily implement their own timeout mechanism if they would like.

Modifications:

CancellationScheduler related changes

DefaultCancellationScheduler is refactored to be lock based instead of event loop based. The reasoning for this change was for the scenario where the request execution didn't reach the event loop yet. i.e. If a user calls ctx.cancel in the decorator, a connection shouldn't be opened.

e.g.

armeria/core/src/test/java/com/linecorp/armeria/client/ContextCancellationTest.java

Lines 90 to 116 in 59aa40a

    
           final Throwable t = new Throwable(); 
        
           final CountingConnectionPoolListener connListener = new CountingConnectionPoolListener(); 
        
           final AtomicReference<ClientRequestContext> ctxRef = new AtomicReference<>(); 
        
           try (ClientFactory cf = ClientFactory 
        
                   .builder() 
        
                   .connectionPoolListener(connListener) 
        
                   .workerGroup(eventLoopGroup.get(), false) 
        
                   .build()) { 
        
               final HttpResponse res = server.webClient(cb -> { 
        
                   cb.decorator((delegate, ctx, req) -> { 
        
                       ctx.cancel(t); 
        
                       ctxRef.set(ctx); 
        
                       return delegate.execute(ctx, req); 
        
                   }); 
        
                   cb.decorator(TestInfoHeaderDecorator.newDecorator(testInfo)); 
        
                   cb.decorator(AttachCallbacksDecorator.newDecorator()); 
        
                   cb.factory(cf); 
        
               }).get("/"); 
        
               assertThatThrownBy(() -> res.aggregate().join()) 
        
                       .isInstanceOf(CompletionException.class) 
        
                       .hasCauseInstanceOf(UnprocessedRequestException.class) 
        
                       .hasRootCause(t); 
        
               assertThat(connListener.opened()).isEqualTo(0); 
        
               assertThat(requests).doesNotContain(testInfo.getDisplayName()); 
        
               // don't validate the thread since we haven't started with event loop scheduling yet 
        
               validateCallbackChecks(null); 
        
           }

CancellationScheduler#updateTask is introduced. This API updates the cancellation task if the scheduler isn't invoked yet. If the scheduler is invoked already, the cancellation task will be executed eventually. This API allows ctx.cancel to attempt cancellation depending on which stage the request is at. For instance, at the decorators only req.abort needs to be called but at the request write stage, the cancellation task may need to send a reset signal.
Misc. an infinite timeout is internally represented as Long.MAX_VALUE instead of 0

AbstractHttpRequestHandler related changes
- CancellationTask in AbstractHttpRequestHandler, HttpResponseWrapper, AbstractHttpResponseHandler is modified to be scheduled inside an event loop. The reasoning is that ctx.cancel, and hence CancellationTask#run can be invoked from any thread.
- There is a higher chance of AbstractHttpRequestHandler calling fail or failAndReset multiple times. There is no point in doing so, so added a boolean flag failed for safety.
HttpResponseWrapper related changes
- The original intention of cancelTimeoutAndLog was to not log if the response is unexpected. Modified so that if the response is cancelled or the context is cancelled, no logging is done
- There is probably no reason to not call close when a timeout occurs. Unified the fragmented logic of closing the HttpResponseWrapper.
Found that DefaultRequestLog is completing futures with the context set on the event loop. We generally don't do this since we strictly validate against setting incorrect RequestContexts, so modified to use withoutContext.

Result:

Users can call ClientRequestContext#cancel to cancel the ongoing request easily.
ResponseTimeout can be scheduled at an earlier timing #5793 can be prepared for

github-actions · 2024-07-05T09:25:28Z

🔍 Build Scan® (commit: `cc8db83`)

Job name	Status	Build Scan®
build-self-hosted-unsafe-jdk-8	✅	https://ge.armeria.dev/s/xt4w6si3fxioe
build-self-hosted-unsafe-jdk-21-snapshot-blockhound	✅	https://ge.armeria.dev/s/j463lr36xrngm
build-self-hosted-unsafe-jdk-17-min-java-17-coverage	✅	https://ge.armeria.dev/s/z7akszkmjtlio
build-self-hosted-unsafe-jdk-17-min-java-11	✅	https://ge.armeria.dev/s/gk5ifs4ykwdgu
build-self-hosted-unsafe-jdk-17-leak	✅	https://ge.armeria.dev/s/jxkgdbss7mqgm
build-self-hosted-unsafe-jdk-11	✅	https://ge.armeria.dev/s/2junq5mszo5ga
build-macos-12-jdk-21	✅	https://ge.armeria.dev/s/ufve7p4hdlpuw

ikhoon · 2024-07-09T07:00:51Z

core/src/main/java/com/linecorp/armeria/common/logging/DefaultRequestLog.java

@@ -428,7 +428,8 @@ private void updateFlags(int flags) {
    private static void completeSatisfiedFutures(RequestLogFuture[] satisfiedFutures, RequestLog log,
                                                 RequestContext ctx) {
        if (!ctx.eventLoop().inEventLoop()) {
-            ctx.eventLoop().execute(() -> completeSatisfiedFutures(satisfiedFutures, log, ctx));
+            ctx.eventLoop().withoutContext().execute(
+                    () -> completeSatisfiedFutures(satisfiedFutures, log, ctx));


Q) Why was the context removed? Shouldn't the request context be propagated to the callbacks of satisfiedFutures?

It seemed like we were making an effort to clear the request context when completing logs.

e.g.

armeria/core/src/main/java/com/linecorp/armeria/client/HttpClientDelegate.java

Lines 242 to 247 in 72ebfe1

try (SafeCloseable ignored = RequestContextUtil.pop()) {

req.abort(cause);

final RequestLogBuilder logBuilder = ctx.logBuilder();

logBuilder.endRequest(cause);

logBuilder.endResponse(cause);

}

I saw ContextCancellationTest failing occasionally due to this, so I modified to use withoutContext.

It was an inevitable choice Netty can handle pending messages in batch mode that can cause a request context leak.

If an action does not perform sequentially in the callbacks of Netty operations, I think a context-aware executor is generally preferred.

I am not strong here. If you have any other workarounds, it is okay to keep it.

Netty can handle pending messages in batch mode that can cause a request context leak

I see, I think I misunderstood the intention then. My original guess was that if request logs for multiple requests are waited on a single condition, users would easily see a request context leak.

I'm not sure I understood what you mean by batch mode though. I thought channels are tied to a single thread, and within a single thread as long as the try...close syntax is used context leaks weren't possible.

try...close syntax is used context leaks weren't possible.

Context leaks could happen even with try...close.
if request logs for multiple requests are waited on a single condition seems to mean the same thing, but I'll explain it just in case.

Say we have a simple service that adds a callback to response.whenComplete().

try (ctx.push()) { val response = HttpResponse.streaming(); response.write(...); response.whenComplete().thenAccept(unused -> { // Do something else. }) }

The code pushes ctx temporary using try with resource block.

response.whenComplete() may be completed when WriteFutureListener.operationComplete() is invoked if it is a unary type.

armeria/core/src/main/java/com/linecorp/armeria/server/AggregatedHttpResponseHandler.java

Line 92 in cd1973b

future.addListener(new WriteFutureListener(response.content().isEmpty(), cause));

Netty Channel stores write requests to ChannelOutboundBuffer.

Channel is busy. response.write() for request_1 to request_9 are added to ChannelOutboundBuffer and operationComplete() isn't invoked for them.

When response.write() for request_10 is written, Channel can flush the outbound data so request_1 to request_10 are flushed in the same call (task).

Without try (SafeCloseable ignored = RequestContextUtil.pop()), the callbacks of request_1-9.whenComplete() will see the context of request_10.

However, it would be impossible to leak the context with ctx.eventLoop().execute(() -> completeSatisfiedFutures()) since it is executed asynchronously. So I thought it was safe to remove .withoutContext().

It seemed like we were making an effort to clear the request context when completing logs.

As you said, contexts are not propagated in some cases. I agree with keeping the current style in terms of consistency.

Thanks for the explanation 👍

When response.write() for request_10 is written, Channel can flush the outbound data so request_1 to request_10 are flushed in the same call (task).

I understood that even if request_1 to request_10 is flushed together in a single task, they will be executed sequentially.
e.g.

ClientRequestContext ctx1 = null; ClientRequestContext ctx2 = null; // represents a single task CommonPools.workerGroup().execute(() -> { // request 1 try (SafeCloseable sf = ctx1.push()) { ctx1.logBuilder().endResponse(); } // request 2 try (SafeCloseable sf = ctx2.push()) { ctx2.logBuilder().endResponse(); } });

I agree with keeping the current style in terms of consistency.

👍 for me as well. Ultimately, my thought was that it depends more on how users are using the API - for this reason it is probably more important that the context is pushed (or not) consistently.

e.g. I can imagine the following scenario

HttpResponseWriter res1 = null; HttpResponseWriter res2 = null; CompletableFuture.allOf(ctx1.log().whenComplete(), ctx2.log().whenComplete()) .handle((v, t) -> { res1.close(); res2.close(); // when the res is completed, the same thread may call ctx.push somewhere return null; });

Sorry for chiming in late.
One thing we need to keep in mind is that if this method is initially called by the ctx.eventLoop(), it might have the context in the thread-local.
Could you let me know which test fails in ContextCancellationTest?

I didn't expect this change to be so controversial 😅 Let me just revert this change since this isn't really critical to this PR

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

ikhoon · 2024-07-09T07:07:18Z

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

+                timeoutNanos = LongMath.saturatedSubtract(timeoutNanos, elapsedTimeNanos);
+            }
+            if (timeoutNanos != Long.MAX_VALUE) {
+                scheduledFuture = eventLoop().schedule(() -> invokeTask(null), timeoutNanos, NANOSECONDS);


Nice clean up!

ikhoon · 2024-07-09T07:11:32Z

core/src/main/java/com/linecorp/armeria/client/HttpClientDelegate.java

+        if (ctxExt == null) {
+            return;
+        }
+        ctxExt.responseCancellationScheduler().updateTask(cause -> {


I suggested removing updateTask() method, but that would result in too many changes to the existing code. As long as the implemetation is correct, I'm okay to maintain the current style and focus on implementing the response timeout mode.

For the record, updateTask() doesn't necessarily have to be removed - it can still be used internally since the old CancellationScheduler#start acted similarly to updateTask().

It is probably more important to unify and simplify the request/response abortion mechanism. This seems difficult to do at the moment for client-side. In particular, when one attempts this one should take note of 1) consistency between [req|res].cause and log.[req|res]Cause, 2) setting the correct exception type (e.g. UnprocessedRequestException).

ikhoon · 2024-07-30T11:08:25Z

core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java

-        SCHEDULED,
-        FINISHING,
-        FINISHED
+        PENDING,


How about SCHEDULED? PENDING sounds a like a status between init() and start().

Fixed, note that technically calling start() doesn't guarantee a task has been scheduled if the timeout is Long.MAX_VALUE

core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

ikhoon

Overall, looks nice. Left minor comments and a corner case.

ikhoon · 2024-07-30T12:46:55Z

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

+            startTimeNanos = ticker.read();
+            if (timeoutMode == TimeoutMode.SET_FROM_NOW) {
+                final long elapsedTimeNanos = startTimeNanos - setFromNowStartNanos;
+                timeoutNanos = LongMath.saturatedSubtract(timeoutNanos, elapsedTimeNanos);


timeoutNanos could be a negative number with TimeoutMode.SET_FROM_NOW if a decorator takes a long time to perform.

Negative numbers are already handled by netty:

https://github.com/netty/netty/blob/59ccbd9fd692b11f858bfbfe0ea79de364792a17/common/src/main/java/io/netty/util/concurrent/AbstractScheduledEventExecutor.java#L201

Added a Long#max call anyways though

Thanks for fixing it. timeoutNanos is also exposed to the userland via ctx.responseTimeoutMillis(). 0 would be better than a negative number.

armeria/core/src/main/java/com/linecorp/armeria/internal/client/DefaultClientRequestContext.java

Lines 824 to 826 in 2367de2

public long responseTimeoutMillis() {

return TimeUnit.NANOSECONDS.toMillis(responseCancellationScheduler.timeoutNanos());

}

ikhoon · 2024-07-30T13:30:31Z

core/src/test/java/com/linecorp/armeria/common/ContextPushHookTest.java

@@ -135,7 +135,9 @@ void shouldRunHooksWhenContextIsPushed() {
            hookEvents.clear();
            response = client.get("http://foo.com:" + server.httpPort() + "/virtualhost");
            assertThat(response.status()).isEqualTo(HttpStatus.OK);
-            assertThat(hookEvents).containsExactly(
+            // we don't do containsExactly here because there is no easy way to guarantee that
+            // all context hooks from the previous request have been completed


Q) Did the previous request have been completed mean the request sent to /server? Has it been a flaky test or introduced by this PR? Would you mind elaborating on the case so that I can understand it?

Let me just revert this change and handle separately if needed since this is a change independent from this PR

ikhoon · 2024-08-02T01:58:53Z

#5156 has been merged. Would you mind resolving the conflicts?

…t-cancel

ikhoon

Nice work, @jrhee17! 👍💯

trustin · 2024-08-08T06:53:31Z

core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java

    static CancellationScheduler ofClient(long timeoutNanos) {
+        if (timeoutNanos == 0) {
+            timeoutNanos = Long.MAX_VALUE;
+        }
        return new DefaultCancellationScheduler(timeoutNanos, false);
    }

    static CancellationScheduler ofServer(long timeoutNanos) {
+        if (timeoutNanos == 0) {
+            timeoutNanos = Long.MAX_VALUE;
+        }
        return new DefaultCancellationScheduler(timeoutNanos, true);
    }


What should we do when a user specified Long.MAX_VALUE? Should we use Long.MAX_VALUE - 1 if we handle Long.MAX_VALUE specially? 🤔

I didn't think about this case too seriously since this is equivalent to a timeout of 292 years.
Having said this, pushed a commit handling this case as well.

jshell> System.out.println(TimeUnit.NANOSECONDS.toDays(Long.MAX_VALUE) / 365); 292

Yeah, but correctness matters 🤣

trustin

Nice clean-up!

minwoox

Sorry for the late review. Left a couple of suggestions.

minwoox · 2024-08-09T00:59:01Z

core/src/main/java/com/linecorp/armeria/common/logging/DefaultRequestLog.java

@@ -428,7 +428,8 @@ private void updateFlags(int flags) {
    private static void completeSatisfiedFutures(RequestLogFuture[] satisfiedFutures, RequestLog log,
                                                 RequestContext ctx) {
        if (!ctx.eventLoop().inEventLoop()) {
-            ctx.eventLoop().execute(() -> completeSatisfiedFutures(satisfiedFutures, log, ctx));
+            ctx.eventLoop().withoutContext().execute(
+                    () -> completeSatisfiedFutures(satisfiedFutures, log, ctx));


Sorry for chiming in late.
One thing we need to keep in mind is that if this method is initially called by the ctx.eventLoop(), it might have the context in the thread-local.
Could you let me know which test fails in ContextCancellationTest?

minwoox · 2024-08-09T01:08:04Z

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

+    private final Ticker ticker;
+    private final ReentrantShortLock lock = new ReentrantShortLock();
+
+    private final boolean server;
    @Nullable
    private volatile CancellationFuture whenCancelling;
    @Nullable
    private volatile CancellationFuture whenCancelled;


Because whenCancelled(), which creates this future, is called almost immediatly after this instance is created, how about creating this future directory at construction?
private final CancellationFuture whenCancelled = new CancellationFuture();

minwoox · 2024-08-09T01:10:11Z

core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java

-                if (pendingTaskUpdater.compareAndSet(this, oldPendingTask, newPendingTask)) {
-                    break;
-                }
+        whenCancelled().thenAccept(cause -> {


This callback is called for all updating tasks.
How about adding this callback only once and use this.task instead of task in the callback?

I didn't really understand your comment here.

This callback is called for all updating tasks.

Technically, this code path will be called when updateTask is invoked after the CancellationScheduler is in the FINISHED state. The initial timed out task won't be invoked here

How about adding this callback only once and use this.task instead of task in the callback?

Are you suggesting that we modify this.task? My concern is this will create a race condition with invokeTask(). Do you mind adding a suggestion block so I can understand your intention?

Oops, my bad. I didn't notice the return statement in the above if condition so I thought thenAccept will be attached for all tasks. 😓

minwoox

Thanks! 👍 👍 👍

ikhoon · 2024-08-09T01:59:25Z

@jrhee17 Would you check the test failures?

ContextCancellationTest > cancel_beforeDelegate(TestInfo) FAILED
    java.lang.AssertionError: 
    Expecting empty but was: [[creqId=147071ed][http://127.0.0.1:34223/#GET]]
        at com.linecorp.armeria.client.ContextCancellationTest.validateCallbackChecks(ContextCancellationTest.java:289)
        at com.linecorp.armeria.client.ContextCancellationTest.cancel_beforeDelegate(ContextCancellationTest.java:115)

ContextCancellationTest > cancel_beforeConnection(TestInfo) FAILED
    java.lang.AssertionError: 
    Expecting empty but was: [[creqId=456bf5ae][http://foo.com:34223/#POST]]
        at com.linecorp.armeria.client.ContextCancellationTest.validateCallbackChecks(ContextCancellationTest.java:289)
        at com.linecorp.armeria.client.ContextCancellationTest.cancel_beforeConnection(ContextCancellationTest.java:153)

jrhee17 · 2024-08-09T03:46:29Z

Let me go ahead and merge this PR since it's reliable passing the CI again, thanks for the review all 👍

Motivation: The original motivation of this PR stems from #4591. There has been requests for a timeout which spans over an entire client's lifecycle. Following #5800, this can be achieved easily by adjusting where to call `CancellationScheduler#start`. I propose that we add an enum `ResponseTimeoutMode` which decides when a `CancellationScheduler` will start to be scheduled. By doing so, we can allow users to choose when a response timeout will start to be scheduled per client. Modifications: - Introduced a new `ResponseTimeoutMode` enum and added options in `AbstractClientOptionsBuilder` and `Flags` respectively - Depending on the set `ResponseTimeoutMode`, the timeout is started on 1) context init 2) request start 3) or response read Result: - Closes #4591

client contexts are cancellable

3b8958a

jrhee17 added the new feature label Jul 5, 2024

jrhee17 added this to the 1.30.0 milestone Jul 5, 2024

Merge branch 'main' into feature/client-context-cancel

59aa40a

jrhee17 marked this pull request as ready for review July 5, 2024 09:20

jrhee17 requested review from ikhoon, minwoox and trustin as code owners July 5, 2024 09:20

ikhoon reviewed Jul 9, 2024

View reviewed changes

minor nit

e6f95c1

ikhoon reviewed Jul 30, 2024

View reviewed changes

jrhee17 added 2 commits August 1, 2024 14:25

Merge branch 'main' into feature/client-context-cancel

7289baf

address comments by @ikhoon

06012a1

Merge remote-tracking branch 'origin/main' into feature/client-contex…

88c4c18

…t-cancel

ikhoon approved these changes Aug 2, 2024

View reviewed changes

jrhee17 mentioned this pull request Aug 2, 2024

ResponseTimeout can be scheduled at an earlier timing #5793

Merged

ikhoon added improvement and removed new feature labels Aug 7, 2024

trustin reviewed Aug 8, 2024

View reviewed changes

address comment by @trustin

5cb756b

trustin approved these changes Aug 8, 2024

View reviewed changes

Merge branch 'main' into feature/client-context-cancel

9068e4f

minwoox reviewed Aug 9, 2024

View reviewed changes

address comments by @minwoox

36c094c

minwoox approved these changes Aug 9, 2024

View reviewed changes

don't validate log completion thread local before a connection is opened

610f83d

handle flaky

cc8db83

jrhee17 merged commit 9a29b39 into line:main Aug 9, 2024
15 checks passed

	final Throwable t = new Throwable();
	final CountingConnectionPoolListener connListener = new CountingConnectionPoolListener();
	final AtomicReference<ClientRequestContext> ctxRef = new AtomicReference<>();
	try (ClientFactory cf = ClientFactory
	.builder()
	.connectionPoolListener(connListener)
	.workerGroup(eventLoopGroup.get(), false)
	.build()) {
	final HttpResponse res = server.webClient(cb -> {
	cb.decorator((delegate, ctx, req) -> {
	ctx.cancel(t);
	ctxRef.set(ctx);
	return delegate.execute(ctx, req);
	});
	cb.decorator(TestInfoHeaderDecorator.newDecorator(testInfo));
	cb.decorator(AttachCallbacksDecorator.newDecorator());
	cb.factory(cf);
	}).get("/");
	assertThatThrownBy(() -> res.aggregate().join())
	.isInstanceOf(CompletionException.class)
	.hasCauseInstanceOf(UnprocessedRequestException.class)
	.hasRootCause(t);
	assertThat(connListener.opened()).isEqualTo(0);
	assertThat(requests).doesNotContain(testInfo.getDisplayName());
	// don't validate the thread since we haven't started with event loop scheduling yet
	validateCallbackChecks(null);
	}

	try (SafeCloseable ignored = RequestContextUtil.pop()) {
	req.abort(cause);
	final RequestLogBuilder logBuilder = ctx.logBuilder();
	logBuilder.endRequest(cause);
	logBuilder.endResponse(cause);
	}

	public long responseTimeoutMillis() {
	return TimeUnit.NANOSECONDS.toMillis(responseCancellationScheduler.timeoutNanos());
	}

ClientRequestContext#cancel cancels the associated request immediately #5800

ClientRequestContext#cancel cancels the associated request immediately #5800

Conversation

jrhee17 commented Jul 5, 2024 • edited Loading

github-actions bot commented Jul 5, 2024 • edited Loading

🔍 Build Scan® (commit: cc8db83)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrhee17 Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

ikhoon Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ikhoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ikhoon commented Aug 2, 2024

ikhoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trustin left a comment

Choose a reason for hiding this comment

minwoox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minwoox left a comment

Choose a reason for hiding this comment

ikhoon commented Aug 9, 2024

jrhee17 commented Aug 9, 2024

`ClientRequestContext#cancel` cancels the associated request immediately #5800

`ClientRequestContext#cancel` cancels the associated request immediately #5800

jrhee17 commented Jul 5, 2024 •

edited

Loading

github-actions bot commented Jul 5, 2024 •

edited

Loading

🔍 Build Scan® (commit: `cc8db83`)

jrhee17 Jul 24, 2024 •

edited

Loading

ikhoon Jul 30, 2024 •

edited

Loading