Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle response correctly when request already cancelled #110249

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Jun 27, 2024

Related to #109866

Since #110118 the test fails under the same circumstances, but in a different way.

Now, when processResponse executes after the cancellation has completed (localRefs.hasReferences() == false) - but before the channel is closed (RestActionListener.ensureOpen() passes), the call to mustIncRef() fails because the reference counts have already reached zero. This change uses tryIncRef() instead, double-checking on a failure that hasReferences() == false (although I'm not sure whether that bit is necessary)

@nicktindall nicktindall added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Network Http and internode communication implementations labels Jun 27, 2024
@elasticsearchmachine elasticsearchmachine added v8.15.0 Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Jun 27, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@nicktindall nicktindall requested a review from ywangd June 27, 2024 23:48
if (localRefs.tryIncRef() == false) {
assert localRefs.hasReferences() == false : "tryIncRef failed but RefCounted not completed";
return;
}
Copy link
Contributor Author

@nicktindall nicktindall Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it could hide the test being broken (i.e. if we always cancel before the call to processResponse it'll happily pass without testing anything).

Another approach would be to put a synchronization point after the call to sendResponse, which we wait on before cancelling the request. That way we could be sure we'd begun sending the response before the cancellation? but perhaps handling cancellation prior to the response being sent is part of the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could just call tryIncRef, and if it fails to increment the counter, it should also not decrement it later (by virtue of channel.sendResponse not sending the chunked response)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current change is fine. We also do something similar at prepareRequest a few lines above.

perhaps handling cancellation prior to the response being sent is part of the test?

I think so. The random sleep in the test method is intended to have cancellation happen in different places, including in prepareRequest where no response is ever generated.

it should also not decrement it later (by virtue of channel.sendResponse not sending the chunked response)?

I think this might not be the case. It seems that response is released if channel.sendResponse fails to send.

if (success == false) {
Releasables.close(toClose);

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nicktindall nicktindall merged commit 6f72d4c into elastic:main Jun 28, 2024
15 checks passed
@nicktindall nicktindall deleted the fix/109866_Netty4ChunkedContinuationsIT_again branch June 28, 2024 01:01
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jun 30, 2024
With the changes in elastic#109519 we now do one more async step while serving
the response, so we need to acquire another ref to track the new step.

Relates elastic#109866
Relates elastic#110118
Relates elastic#110175
Relates elastic#110249
DaveCTurner added a commit that referenced this pull request Jul 1, 2024
With the changes in #109519 we now do one more async step while serving
the response, so we need to acquire another ref to track the new step.

Relates #109866
Relates #110118
Relates #110175
Relates #110249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants