Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce assertions that IO does not occur on network threads #54066

Open
DaveCTurner opened this issue Mar 24, 2020 · 5 comments
Open

Introduce assertions that IO does not occur on network threads #54066

DaveCTurner opened this issue Mar 24, 2020 · 5 comments
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team

Comments

@DaveCTurner
Copy link
Contributor

As a rule we should not be touching the filesystem on a network thread (transport_worker or http_server_worker) but today we do not assert this to be the case. It is all too easy to inadvertently move some filesystem access onto a network thread (see e.g. #53985 for a recent example of this). @jpountz suggested that it should be possible to introduce assertions during all IO to ensure that it does not occur on a network thread. For instance, we could implement a FileSystemProvider to check the current thread's name (cf. Transports#assertNotTransportThread).

@DaveCTurner DaveCTurner added >enhancement :Core/Infra/Core Core issues without another label labels Mar 24, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@rjernst
Copy link
Member

rjernst commented Mar 24, 2020

This is an interesting idea! One note on implementation: we would need to think about how to compose or still allow extending the filesystem abstraction, since external systems sometimes use this with elasticsearch (eg quote aware filesystem).

@jaymode
Copy link
Member

jaymode commented Mar 24, 2020

This is an interesting idea!

++. Another area where we will need to watch for is logging (including audit logging, see #39658) where filesystem access will occur on a network thread in our current state.

@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@pgomulka pgomulka added team-discuss and removed needs:triage Requires assignment of a team area label labels Dec 10, 2020
@williamrandolph
Copy link
Contributor

We discussed this issue in a Core/Infra team meeting today.

The team agreed that these assertions would be very good to have, and would serve as guardrails to help us catch tricky performance issues in tests rather than in production. In short, this is something we should plan to do, and we should aim to do it before the final minor release in the 7.x line so that we have these guardrails in the 7.x code as we maintain it.

What we don't know is how much of a drag these extra assertions would put on our test times, and we need to find that out in order to discuss the tradeoffs. So the first effort at this issue should be some experiments that show the effect of this change on our tests, followed by another round of team discussion.

We also noted that File I/O is not the only slow/blocking thing that can inadvertently clog up our network threads, and we should think about whether there are any other situations for which we could add similar guardrails.

@williamrandolph
Copy link
Contributor

We have already implemented guardrail assertions against calling Future#get() in certain threadpools. See:

protected boolean blockingAllowed() {
return Transports.assertNotTransportThread(BLOCKING_OP_REASON) &&
ThreadPool.assertNotScheduleThread(BLOCKING_OP_REASON) &&
ClusterApplierService.assertNotClusterStateUpdateThread(BLOCKING_OP_REASON) &&
MasterService.assertNotMasterUpdateThread(BLOCKING_OP_REASON);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

6 participants