Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve thread context when calling getNextPart() #109519

Merged

Conversation

DaveCTurner
Copy link
Contributor

The listener supplied to getNextPart() may be completed in a
non-default thread context, and this trips assertions about thread
context pollution in the transport layer, so this commit adds the
missing protection against that.

Also the listener supplied to getNextPart() may be completed
immediately on the transport thread, which could lead to a stack
overflow, so this commit adds another execute() call to
unconditionally fork a fresh task.

Relates #104851

The listener supplied to `getNextPart()` may be completed in a
non-default thread context, and this trips assertions about thread
context pollution in the transport layer, so this commit adds the
missing protection against that.

Also the listener supplied to `getNextPart()` may be completed
immediately on the transport thread, which could lead to a stack
overflow, so this commit adds another `execute()` call to
unconditionally fork a fresh task.

Relates elastic#104851
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Network Http and internode communication implementations v8.15.0 labels Jun 10, 2024
@DaveCTurner DaveCTurner requested a review from ywangd June 10, 2024 09:11
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner marked this pull request as draft June 10, 2024 10:42
@DaveCTurner DaveCTurner removed the request for review from ywangd June 10, 2024 10:43
@DaveCTurner DaveCTurner requested a review from ywangd June 17, 2024 06:21
@DaveCTurner DaveCTurner marked this pull request as ready for review June 17, 2024 06:21
@DaveCTurner DaveCTurner requested a review from pxsalehi June 17, 2024 06:22
@ywangd
Copy link
Member

ywangd commented Jun 17, 2024

The listener supplied to getNextPart() may be completed in a
non-default thread context, and this trips assertions about thread
context pollution in the transport layer, so this commit adds the
missing protection against that.

Which assertion is this? Have you seen such a test failure that can be linked to this PR?

@DaveCTurner
Copy link
Contributor Author

Any of the various existing calls to org.elasticsearch.transport.Transports#assertDefaultThreadContext will catch this. I don't have a failure of this in CI to which I could link but I've seen this in local testing of follow-ups to #104851.

@DaveCTurner
Copy link
Contributor Author

Also FWIW this is the same as what we do in org.elasticsearch.http.netty4.Netty4HttpHeaderValidator#requestStart

}), finishingWriteBodyPart::getNextPart);
ActionListener.run(
new ContextPreservingActionListener<>(
threadContext.newRestorableContext(false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the response has multiple continuations, it seems to me that we are technically restoring the threadContext from the previous iteration instead of the threadContext when we send the first part? But maybe it is effectively the same since (1) the capture and restore are chained and (2) there is nothing in writeAndFlush could change the threadContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but we're in the Netty event loop here so we are indeed not changing the thread context (which is why we call assertDefaultThreadContext to check that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertDefaultThreadContext does not check response header. But I see what you mean.

@DaveCTurner DaveCTurner requested a review from ywangd June 17, 2024 09:23
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 0a008ed into elastic:main Jun 17, 2024
15 checks passed
@DaveCTurner DaveCTurner deleted the 2024/06/10/getNextPart-threadContext branch June 17, 2024 09:49
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jun 30, 2024
With the changes in elastic#109519 we now do one more async step while serving
the response, so we need to acquire another ref to track the new step.

Relates elastic#109866
Relates elastic#110118
Relates elastic#110175
Relates elastic#110249
DaveCTurner added a commit that referenced this pull request Jul 1, 2024
With the changes in #109519 we now do one more async step while serving
the response, so we need to acquire another ref to track the new step.

Relates #109866
Relates #110118
Relates #110175
Relates #110249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants