-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply cluster states in system context #53785
Apply cluster states in system context #53785
Conversation
Today cluster states are sometimes (rarely) applied in the default context rather than system context, which means that any appliers which capture their contexts cannot do things like remote transport actions when security is enabled. There are at least two ways that we end up applying the cluster state in the default context: 1. locally applying a cluster state that indicates that the master has failed 2. the elected master times out while waiting for a response from another node This commit ensures that cluster states are always applied in the system context. Mitigates elastic#53751
Pinging @elastic/es-distributed (:Distributed/Cluster Coordination) |
try { | ||
final ThreadContext threadContext = threadPool.getThreadContext(); | ||
try (ThreadContext.StoredContext ignored = threadContext.stashContext()) { | ||
threadContext.markAsSystemContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change. I would have preferred to avoid an explicit markAsSystemContext
and instead to start everything in the system context and then preserve the context everywhere. It all fell to pieces a bit since system context is not propagated across transport messages and I timed out while trying to come up with a reliable way to assert that things are happening in the right context.
@@ -1163,15 +1163,15 @@ public TestClusterNode currentMaster(ClusterState state) { | |||
TestClusterNode(DiscoveryNode node) throws IOException { | |||
this.node = node; | |||
final Environment environment = createEnvironment(node.getName()); | |||
masterService = new FakeThreadPoolMasterService(node.getName(), "test", deterministicTaskQueue::scheduleNow); | |||
threadPool = deterministicTaskQueue.getThreadPool(runnable -> CoordinatorTests.onNodeLog(node, runnable)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the rest of the changes, like this one, are to ensure that we use the same ThreadPool
instance for entering system context in the master service and then for asserting that publications are sent in system context. Before this change, we were creating multiple threadpool instances which was ok since we were ignoring their stateful behaviour.
NamedXContentRegistry xContentRegistry, Environment environment, | ||
NodeEnvironment nodeEnvironment, NamedWriteableRegistry namedWriteableRegistry, | ||
IndexNameExpressionResolver indexNameExpressionResolver) { | ||
clusterService.addStateApplier(event -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the key assertions. It seemed a bit vacuous to put them directly in the ClusterApplierService
since that's where we enter system context too, but that's another option...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Today cluster states are sometimes (rarely) applied in the default context rather than system context, which means that any appliers which capture their contexts cannot do things like remote transport actions when security is enabled. There are at least two ways that we end up applying the cluster state in the default context: 1. locally applying a cluster state that indicates that the master has failed 2. the elected master times out while waiting for a response from another node This commit ensures that cluster states are always applied in the system context. Mitigates #53751
This reverts commit 4178c57.
This reverts commit 7d3ac4f.
This reverts commit c1dc523.
Today cluster states are sometimes (rarely) applied in the default context
rather than system context, which means that any appliers which capture their
contexts cannot do things like remote transport actions when security is
enabled.
There are at least two ways that we end up applying the cluster state in the
default context:
This commit ensures that cluster states are always applied in the system
context.
Mitigates #53751