Apply cluster states in system context #53785

DaveCTurner · 2020-03-19T10:46:06Z

Today cluster states are sometimes (rarely) applied in the default context
rather than system context, which means that any appliers which capture their
contexts cannot do things like remote transport actions when security is
enabled.

There are at least two ways that we end up applying the cluster state in the
default context:

locally applying a cluster state that indicates that the master has failed
the elected master times out while waiting for a response from another node

This commit ensures that cluster states are always applied in the system
context.

Mitigates #53751

Today cluster states are sometimes (rarely) applied in the default context rather than system context, which means that any appliers which capture their contexts cannot do things like remote transport actions when security is enabled. There are at least two ways that we end up applying the cluster state in the default context: 1. locally applying a cluster state that indicates that the master has failed 2. the elected master times out while waiting for a response from another node This commit ensures that cluster states are always applied in the system context. Mitigates elastic#53751

elasticmachine · 2020-03-19T10:46:09Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

DaveCTurner · 2020-03-19T10:50:13Z

server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java

-        try {
+        final ThreadContext threadContext = threadPool.getThreadContext();
+        try (ThreadContext.StoredContext ignored = threadContext.stashContext()) {
+            threadContext.markAsSystemContext();


This is the key change. I would have preferred to avoid an explicit markAsSystemContext and instead to start everything in the system context and then preserve the context everywhere. It all fell to pieces a bit since system context is not propagated across transport messages and I timed out while trying to come up with a reliable way to assert that things are happening in the right context.

DaveCTurner · 2020-03-19T10:52:11Z

server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java

@@ -1163,15 +1163,15 @@ public TestClusterNode currentMaster(ClusterState state) {
            TestClusterNode(DiscoveryNode node) throws IOException {
                this.node = node;
                final Environment environment = createEnvironment(node.getName());
-                masterService = new FakeThreadPoolMasterService(node.getName(), "test", deterministicTaskQueue::scheduleNow);
+                threadPool = deterministicTaskQueue.getThreadPool(runnable -> CoordinatorTests.onNodeLog(node, runnable));


Many of the rest of the changes, like this one, are to ensure that we use the same ThreadPool instance for entering system context in the master service and then for asserting that publications are sent in system context. Before this change, we were creating multiple threadpool instances which was ok since we were ignoring their stateful behaviour.

DaveCTurner · 2020-03-19T10:53:18Z

...framework/src/main/java/org/elasticsearch/cluster/service/ClusterApplierAssertionPlugin.java

+                                               NamedXContentRegistry xContentRegistry, Environment environment,
+                                               NodeEnvironment nodeEnvironment, NamedWriteableRegistry namedWriteableRegistry,
+                                               IndexNameExpressionResolver indexNameExpressionResolver) {
+        clusterService.addStateApplier(event -> {


These are the key assertions. It seemed a bit vacuous to put them directly in the ClusterApplierService since that's where we enter system context too, but that's another option...

This is good.

DaveCTurner · 2020-03-19T11:05:14Z

@elasticmachine update branch

jasontedor

LGTM.

Today cluster states are sometimes (rarely) applied in the default context rather than system context, which means that any appliers which capture their contexts cannot do things like remote transport actions when security is enabled. There are at least two ways that we end up applying the cluster state in the default context: 1. locally applying a cluster state that indicates that the master has failed 2. the elected master times out while waiting for a response from another node This commit ensures that cluster states are always applied in the system context. Mitigates #53751

This reverts commit 4178c57.

This reverts commit 7d3ac4f.

This reverts commit c1dc523.

DaveCTurner · 2020-03-20T08:59:42Z

It turned out that this was not the right approach, but we only discovered this when trying to backport it. I immediately reverted the change in 7.x (7d3ac4f) and will revert the master change in #53842.

This reverts commit c1dc523.

DaveCTurner added 7 commits March 19, 2020 09:17

Assertions during publication

8e51526

Cop out

848fae5

Assert that appliers/listeners are in system context

7f8c33e

Use the same threadpool everywhere

52163d6

More threadpool consistency

e98b2cf

Precommit

561a7f8

DaveCTurner added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.7.0 v7.6.3 labels Mar 19, 2020

DaveCTurner requested a review from jasontedor March 19, 2020 10:46

DaveCTurner commented Mar 19, 2020

View reviewed changes

Merge branch 'master' into 2020-03-19-system-context-in-applier

31c29ef

jasontedor approved these changes Mar 19, 2020

View reviewed changes

DaveCTurner merged commit c1dc523 into elastic:master Mar 19, 2020

DaveCTurner deleted the 2020-03-19-system-context-in-applier branch March 19, 2020 14:13

DaveCTurner added a commit that referenced this pull request Mar 19, 2020

Revert "Apply cluster states in system context (#53785)"

7d3ac4f

This reverts commit 4178c57.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 19, 2020

Revert "Revert "Apply cluster states in system context (elastic#53785)""

8b8286b

This reverts commit 7d3ac4f.

DaveCTurner mentioned this pull request Mar 19, 2020

Apply cluster states in system context #53819

Closed

gwbrown mentioned this pull request Mar 19, 2020

Transition Transforms to using hidden indices for notifcations index #53773

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 20, 2020

Revert "Apply cluster states in system context (elastic#53785)"

6a42203

This reverts commit c1dc523.

DaveCTurner mentioned this pull request Mar 20, 2020

Revert "Apply cluster states in system context (#53785)" #53842

Merged

DaveCTurner removed v7.6.3 v7.7.0 labels Mar 20, 2020

DaveCTurner added a commit that referenced this pull request Mar 20, 2020

Revert "Apply cluster states in system context (#53785)" (#53842)

76cd638

This reverts commit c1dc523.

DaveCTurner mentioned this pull request Mar 20, 2020

Use consistent threadpools in CoordinatorTests #53868

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply cluster states in system context #53785

Apply cluster states in system context #53785

DaveCTurner commented Mar 19, 2020

elasticmachine commented Mar 19, 2020

DaveCTurner Mar 19, 2020

DaveCTurner Mar 19, 2020

DaveCTurner Mar 19, 2020

jasontedor Mar 19, 2020

DaveCTurner commented Mar 19, 2020

jasontedor left a comment

DaveCTurner commented Mar 20, 2020

Apply cluster states in system context #53785

Apply cluster states in system context #53785

Conversation

DaveCTurner commented Mar 19, 2020

elasticmachine commented Mar 19, 2020

DaveCTurner Mar 19, 2020

Choose a reason for hiding this comment

DaveCTurner Mar 19, 2020

Choose a reason for hiding this comment

DaveCTurner Mar 19, 2020

Choose a reason for hiding this comment

jasontedor Mar 19, 2020

Choose a reason for hiding this comment

DaveCTurner commented Mar 19, 2020

jasontedor left a comment

Choose a reason for hiding this comment

DaveCTurner commented Mar 20, 2020