-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClientRequestContext#cancel
cancels the associated request immediately
#5800
Conversation
@@ -428,7 +428,8 @@ private void updateFlags(int flags) { | |||
private static void completeSatisfiedFutures(RequestLogFuture[] satisfiedFutures, RequestLog log, | |||
RequestContext ctx) { | |||
if (!ctx.eventLoop().inEventLoop()) { | |||
ctx.eventLoop().execute(() -> completeSatisfiedFutures(satisfiedFutures, log, ctx)); | |||
ctx.eventLoop().withoutContext().execute( | |||
() -> completeSatisfiedFutures(satisfiedFutures, log, ctx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q) Why was the context removed? Shouldn't the request context be propagated to the callbacks of satisfiedFutures
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed like we were making an effort to clear the request context when completing logs.
e.g.
armeria/core/src/main/java/com/linecorp/armeria/client/HttpClientDelegate.java
Lines 242 to 247 in 72ebfe1
try (SafeCloseable ignored = RequestContextUtil.pop()) { | |
req.abort(cause); | |
final RequestLogBuilder logBuilder = ctx.logBuilder(); | |
logBuilder.endRequest(cause); | |
logBuilder.endResponse(cause); | |
} |
I saw ContextCancellationTest
failing occasionally due to this, so I modified to use withoutContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an inevitable choice Netty can handle pending messages in batch mode that can cause a request context leak.
If an action does not perform sequentially in the callbacks of Netty operations, I think a context-aware executor is generally preferred.
I am not strong here. If you have any other workarounds, it is okay to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Netty can handle pending messages in batch mode that can cause a request context leak
I see, I think I misunderstood the intention then. My original guess was that if request logs for multiple requests are waited on a single condition, users would easily see a request context leak.
I'm not sure I understood what you mean by batch mode though. I thought channels are tied to a single thread, and within a single thread as long as the try...close
syntax is used context leaks weren't possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try...close syntax is used context leaks weren't possible.
Context leaks could happen even with try...close
.
if request logs for multiple requests are waited on a single condition
seems to mean the same thing, but I'll explain it just in case.
Say we have a simple service that adds a callback to response.whenComplete()
.
try (ctx.push()) {
val response = HttpResponse.streaming();
response.write(...);
response.whenComplete().thenAccept(unused -> {
// Do something else.
})
}
- The code pushes
ctx
temporary usingtry with resource
block. response.whenComplete()
may be completed whenWriteFutureListener.operationComplete()
is invoked if it is a unary type.
armeria/core/src/main/java/com/linecorp/armeria/server/AggregatedHttpResponseHandler.java
Line 92 in cd1973b
future.addListener(new WriteFutureListener(response.content().isEmpty(), cause)); - Netty
Channel
stores write requests toChannelOutboundBuffer
. Channel
is busy.response.write()
forrequest_1
torequest_9
are added toChannelOutboundBuffer
andoperationComplete()
isn't invoked for them.- When
response.write()
forrequest_10
is written,Channel
can flush the outbound data sorequest_1
torequest_10
are flushed in the same call (task). - Without
try (SafeCloseable ignored = RequestContextUtil.pop())
, the callbacks ofrequest_1-9.whenComplete()
will see the context ofrequest_10
.
However, it would be impossible to leak the context with ctx.eventLoop().execute(() -> completeSatisfiedFutures())
since it is executed asynchronously. So I thought it was safe to remove .withoutContext()
.
It seemed like we were making an effort to clear the request context when completing logs.
As you said, contexts are not propagated in some cases. I agree with keeping the current style in terms of consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation 👍
When response.write() for request_10 is written, Channel can flush the outbound data so request_1 to request_10 are flushed in the same call (task).
I understood that even if request_1
to request_10
is flushed together in a single task, they will be executed sequentially.
e.g.
ClientRequestContext ctx1 = null;
ClientRequestContext ctx2 = null;
// represents a single task
CommonPools.workerGroup().execute(() -> {
// request 1
try (SafeCloseable sf = ctx1.push()) {
ctx1.logBuilder().endResponse();
}
// request 2
try (SafeCloseable sf = ctx2.push()) {
ctx2.logBuilder().endResponse();
}
});
I agree with keeping the current style in terms of consistency.
👍 for me as well. Ultimately, my thought was that it depends more on how users are using the API - for this reason it is probably more important that the context is pushed (or not) consistently.
e.g. I can imagine the following scenario
HttpResponseWriter res1 = null;
HttpResponseWriter res2 = null;
CompletableFuture.allOf(ctx1.log().whenComplete(), ctx2.log().whenComplete())
.handle((v, t) -> {
res1.close();
res2.close();
// when the res is completed, the same thread may call ctx.push somewhere
return null;
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for chiming in late.
One thing we need to keep in mind is that if this method is initially called by the ctx.eventLoop()
, it might have the context in the thread-local.
Could you let me know which test fails in ContextCancellationTest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't expect this change to be so controversial 😅 Let me just revert this change since this isn't really critical to this PR
core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java
Show resolved
Hide resolved
timeoutNanos = LongMath.saturatedSubtract(timeoutNanos, elapsedTimeNanos); | ||
} | ||
if (timeoutNanos != Long.MAX_VALUE) { | ||
scheduledFuture = eventLoop().schedule(() -> invokeTask(null), timeoutNanos, NANOSECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean up!
if (ctxExt == null) { | ||
return; | ||
} | ||
ctxExt.responseCancellationScheduler().updateTask(cause -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested removing updateTask()
method, but that would result in too many changes to the existing code. As long as the implemetation is correct, I'm okay to maintain the current style and focus on implementing the response timeout mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, updateTask()
doesn't necessarily have to be removed - it can still be used internally since the old CancellationScheduler#start
acted similarly to updateTask()
.
It is probably more important to unify and simplify the request/response abortion mechanism. This seems difficult to do at the moment for client-side. In particular, when one attempts this one should take note of 1) consistency between [req|res].cause
and log.[req|res]Cause
, 2) setting the correct exception type (e.g. UnprocessedRequestException
).
SCHEDULED, | ||
FINISHING, | ||
FINISHED | ||
PENDING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about SCHEDULED
? PENDING
sounds a like a status between init()
and start()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, note that technically calling start()
doesn't guarantee a task has been scheduled if the timeout is Long.MAX_VALUE
core/src/main/java/com/linecorp/armeria/internal/common/CancellationScheduler.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/DefaultCancellationScheduler.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks nice. Left minor comments and a corner case.
startTimeNanos = ticker.read(); | ||
if (timeoutMode == TimeoutMode.SET_FROM_NOW) { | ||
final long elapsedTimeNanos = startTimeNanos - setFromNowStartNanos; | ||
timeoutNanos = LongMath.saturatedSubtract(timeoutNanos, elapsedTimeNanos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeoutNanos
could be a negative number with TimeoutMode.SET_FROM_NOW
if a decorator takes a long time to perform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative numbers are already handled by netty:
Added a Long#max
call anyways though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing it. timeoutNanos
is also exposed to the userland via ctx.responseTimeoutMillis()
. 0 would be better than a negative number.
armeria/core/src/main/java/com/linecorp/armeria/internal/client/DefaultClientRequestContext.java
Lines 824 to 826 in 2367de2
public long responseTimeoutMillis() { | |
return TimeUnit.NANOSECONDS.toMillis(responseCancellationScheduler.timeoutNanos()); | |
} |
@@ -135,7 +135,9 @@ void shouldRunHooksWhenContextIsPushed() { | |||
hookEvents.clear(); | |||
response = client.get("http://foo.com:" + server.httpPort() + "/virtualhost"); | |||
assertThat(response.status()).isEqualTo(HttpStatus.OK); | |||
assertThat(hookEvents).containsExactly( | |||
// we don't do containsExactly here because there is no easy way to guarantee that | |||
// all context hooks from the previous request have been completed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q) Did the previous request have been completed
mean the request sent to /server
? Has it been a flaky test or introduced by this PR? Would you mind elaborating on the case so that I can understand it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just revert this change and handle separately if needed since this is a change independent from this PR
#5156 has been merged. Would you mind resolving the conflicts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @jrhee17! 👍💯
static CancellationScheduler ofClient(long timeoutNanos) { | ||
if (timeoutNanos == 0) { | ||
timeoutNanos = Long.MAX_VALUE; | ||
} | ||
return new DefaultCancellationScheduler(timeoutNanos, false); | ||
} | ||
|
||
static CancellationScheduler ofServer(long timeoutNanos) { | ||
if (timeoutNanos == 0) { | ||
timeoutNanos = Long.MAX_VALUE; | ||
} | ||
return new DefaultCancellationScheduler(timeoutNanos, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do when a user specified Long.MAX_VALUE
? Should we use Long.MAX_VALUE - 1
if we handle Long.MAX_VALUE
specially? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about this case too seriously since this is equivalent to a timeout of 292 years.
Having said this, pushed a commit handling this case as well.
jshell> System.out.println(TimeUnit.NANOSECONDS.toDays(Long.MAX_VALUE) / 365);
292
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but correctness matters 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean-up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Left a couple of suggestions.
@@ -428,7 +428,8 @@ private void updateFlags(int flags) { | |||
private static void completeSatisfiedFutures(RequestLogFuture[] satisfiedFutures, RequestLog log, | |||
RequestContext ctx) { | |||
if (!ctx.eventLoop().inEventLoop()) { | |||
ctx.eventLoop().execute(() -> completeSatisfiedFutures(satisfiedFutures, log, ctx)); | |||
ctx.eventLoop().withoutContext().execute( | |||
() -> completeSatisfiedFutures(satisfiedFutures, log, ctx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for chiming in late.
One thing we need to keep in mind is that if this method is initially called by the ctx.eventLoop()
, it might have the context in the thread-local.
Could you let me know which test fails in ContextCancellationTest
?
private final Ticker ticker; | ||
private final ReentrantShortLock lock = new ReentrantShortLock(); | ||
|
||
private final boolean server; | ||
@Nullable | ||
private volatile CancellationFuture whenCancelling; | ||
@Nullable | ||
private volatile CancellationFuture whenCancelled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because whenCancelled()
, which creates this future, is called almost immediatly after this instance is created, how about creating this future directory at construction?
private final CancellationFuture whenCancelled = new CancellationFuture();
if (pendingTaskUpdater.compareAndSet(this, oldPendingTask, newPendingTask)) { | ||
break; | ||
} | ||
whenCancelled().thenAccept(cause -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This callback is called for all updating tasks.
How about adding this callback only once and use this.task
instead of task
in the callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really understand your comment here.
This callback is called for all updating tasks.
Technically, this code path will be called when updateTask
is invoked after the CancellationScheduler
is in the FINISHED
state. The initial timed out task won't be invoked here
How about adding this callback only once and use this.task instead of task in the callback?
Are you suggesting that we modify this.task
? My concern is this will create a race condition with invokeTask()
. Do you mind adding a suggestion block so I can understand your intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad. I didn't notice the return
statement in the above if
condition so I thought thenAccept
will be attached for all task
s. 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 👍 👍 👍
@jrhee17 Would you check the test failures?
|
Let me go ahead and merge this PR since it's reliable passing the CI again, thanks for the review all 👍 |
Motivation: The original motivation of this PR stems from #4591. There has been requests for a timeout which spans over an entire client's lifecycle. Following #5800, this can be achieved easily by adjusting where to call `CancellationScheduler#start`. I propose that we add an enum `ResponseTimeoutMode` which decides when a `CancellationScheduler` will start to be scheduled. By doing so, we can allow users to choose when a response timeout will start to be scheduled per client. Modifications: - Introduced a new `ResponseTimeoutMode` enum and added options in `AbstractClientOptionsBuilder` and `Flags` respectively - Depending on the set `ResponseTimeoutMode`, the timeout is started on 1) context init 2) request start 3) or response read Result: - Closes #4591 <!-- Visit this URL to learn more about how to write a pull request description: https://armeria.dev/community/developer-guide#how-to-write-pull-request-description -->
Motivation:
In order to handle #4591, I propose that we first define an API which allows users to cancel a request. Currently,
ClientRequestContext#cancel
is invoked onceCancellationScheduler#start
is invoked. I propose to change the behavior ofClientRequestContext#cancel
such that the associated request is aborted immediately. Once this API is available, it would be trivial to implementResponseTimeoutMode
by adjusting where to callCancellationScheduler#start
. Additionally, users may easily implement their own timeout mechanism if they would like.Modifications:
CancellationScheduler
related changesDefaultCancellationScheduler
is refactored to be lock based instead of event loop based. The reasoning for this change was for the scenario where the request execution didn't reach the event loop yet. i.e. If a user callsctx.cancel
in the decorator, a connection shouldn't be opened.armeria/core/src/test/java/com/linecorp/armeria/client/ContextCancellationTest.java
Lines 90 to 116 in 59aa40a
CancellationScheduler#updateTask
is introduced. This API updates the cancellation task if the scheduler isn't invoked yet. If the scheduler is invoked already, the cancellation task will be executed eventually. This API allowsctx.cancel
to attempt cancellation depending on which stage the request is at. For instance, at the decorators onlyreq.abort
needs to be called but at the request write stage, the cancellation task may need to send a reset signal.Long.MAX_VALUE
instead of0
AbstractHttpRequestHandler
related changesCancellationTask
inAbstractHttpRequestHandler
,HttpResponseWrapper
,AbstractHttpResponseHandler
is modified to be scheduled inside an event loop. The reasoning is thatctx.cancel
, and henceCancellationTask#run
can be invoked from any thread.AbstractHttpRequestHandler
callingfail
orfailAndReset
multiple times. There is no point in doing so, so added a boolean flagfailed
for safety.HttpResponseWrapper
related changescancelTimeoutAndLog
was to not log if the response is unexpected. Modified so that if the response is cancelled or the context is cancelled, no logging is doneclose
when a timeout occurs. Unified the fragmented logic of closing theHttpResponseWrapper
.DefaultRequestLog
is completing futures with the context set on the event loop. We generally don't do this since we strictly validate against setting incorrectRequestContext
s, so modified to usewithoutContext
.Result:
ClientRequestContext#cancel
to cancel the ongoing request easily.ResponseTimeout
can be scheduled at an earlier timing #5793 can be prepared for