Improve master service batching queues #92021

DaveCTurner · 2022-11-30T13:40:30Z

Today the master's pending task queue is just the PriorityBlockingQueue<Runnable> belonging to the underlying ThreadPoolExecutor. The reasons for this date back a long way but it doesn't really reflect the structure of the queue as it exists today. In particular, we must keep track of batches independently of the queue itself, and must do various bits of unchecked casting to process multiple items of the same type at once.

This commit introduces an new queueing mechanism, independent of the executor's queue, which better represents the conceptual structure of the master's pending tasks:

Today we use a priority queue to allow important tasks to preempt less-important ones. However there are only a small number of priority levels, so it is simpler to maintain a queue for each priority, effectively replacing the sorting within the priority queue with a radix sort.
Today when a task is submitted we perform a map lookup to see if it can be added to an existing batch or not. With this change we allow client code to create its own dedicated queue of tasks. The entries in the per-priority-level queues are themselves queues, one for each executor, representing the batches to be run.
Today each task in the queue holds a reference to its executor, but the executor used to run a task may belong to a different task in the same batch. In practice we know they're the same executor (that's how batches are defined) but we cannot express this knowledge in the type system so we have to do a bunch of unchecked casting to work around it. With this change we associate each per-executor queue directly with its executor, avoiding the need to do all this unchecked casting.
Today the master service must block its thread while waiting for each task to complete, because otherwise the executor would start to process the next task in the queue. This makes testing using a DeterministicTaskQueue harder (see FakeThreadPoolMasterService). This change avoids enqueueing tasks on the ThreadPoolExecutor unless there is genuinely work to do, although it leaves the removal of the actual blocking to a followup.

Closes #81626

Today the master's pending task queue is just the `PriorityBlockingQueue<Runnable>` belonging to the underlying `ThreadPoolExecutor`. The reasons for this date back a long way but it doesn't really reflect the structure of the queue as it exists today. In particular, we must keep track of batches independently of the queue itself, and must do various bits of unchecked casting to process multiple items of the same type at once. This commit introduces an new queueing mechanism, independent of the executor's queue, which better represents the conceptual structure of the master's pending tasks: * Today we use a priority queue to allow important tasks to preempt less-important ones. However there are only a small number of priority levels, so it is simpler to maintain a queue for each priority, effectively replacing the sorting within the priority queue with a radix sort. * Today when a task is submitted we perform a map lookup to see if it can be added to an existing batch or not. With this change we allow client code to create its own dedicated queue of tasks. The entries in the per-priority-level queues are themselves queues, one for each executor, representing the batches to be run. * Today each task in the queue holds a reference to its executor, but the executor used to run a task may belong to a different task in the same batch. In practice we know they're the same executor (that's how batches are defined) but we cannot express this knowledge in the type system so we have to do a bunch of unchecked casting to work around it. With this change we associate each per-executor queue directly with its executor, avoiding the need to do all this unchecked casting. * Today the master service must block its thread while waiting for each task to complete, because otherwise the executor would start to process the next task in the queue. This makes testing using a `DeterministicTaskQueue` harder (see `FakeThreadPoolMasterService`). This change avoids enqueueing tasks on the `ThreadPoolExecutor` unless there is genuinely work to do, although it leaves the removal of the actual blocking to a followup.

elasticsearchmachine · 2022-11-30T13:40:54Z

Hi @DaveCTurner, I've created a changelog YAML for you.

elasticsearchmachine · 2022-11-30T13:40:54Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-11-30T13:41:28Z

Hi @DaveCTurner, I've updated the changelog YAML for you.

DaveCTurner

The meaningful part of this change is in MasterService and its tests.

Almost all of the other changes here are mechanically replacing clusterService.submitStateUpdateTask with taskQueue.submitTask on a suitably-constructed taskQueue.

It's (slightly) nicer if you ignore whitespace changes.

I've left some other notes for reviewers inline.

server/src/internalClusterTest/java/org/elasticsearch/cluster/service/ClusterServiceIT.java

DaveCTurner · 2022-11-30T13:59:59Z

...rc/main/java/org/elasticsearch/action/admin/cluster/health/TransportClusterHealthAction.java

+                        || e instanceof FailedToCommitClusterStateException
+                            && e.getCause()instanceof EsRejectedExecutionException esre
+                            && esre.isExecutorShutdown();


Today's MasterService silently drops stuff when it's shut down. With this change we become stricter about rejecting work explicitly in this case.

DaveCTurner · 2022-11-30T14:02:57Z

.../src/main/java/org/elasticsearch/common/util/concurrent/PrioritizedEsThreadPoolExecutor.java

@@ -46,12 +45,10 @@ public PrioritizedEsThreadPoolExecutor(
        TimeUnit unit,
        ThreadFactory threadFactory,
        ThreadContext contextHolder,
-        ScheduledExecutorService timer,
-        StarvationWatcher starvationWatcher


The starvation-watching mechanism is now built into the MasterService itself, not the executor, so this parameter is unnecessary and removed everywhere.

DaveCTurner · 2022-11-30T15:04:38Z

@elasticmachine update branch

DaveCTurner · 2022-12-01T13:41:28Z

@elasticmachine update branch

Deferring this until after elastic#92021 when there will be more typesafety and better tests.

henningandersen

I did an initial read of the MasterService and wanted to relay my comments now to ensure alignment before going in details and reading tests.

Overall, this change looks great.

server/src/main/java/org/elasticsearch/cluster/service/ClusterService.java

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java

The changes introduced in elastic#92021 mean that there is no need for the master service to block its thread while waiting for each publication to complete. This commit removes the now-unnecessary blocking. This commit also removes the now-unnecessary fake blocking in the `FakeThreadPoolMasterService` used in tests, bringing the implementation covered by tests closer to the production implementation.

The changes introduced in elastic#92021 mean that the master service no longer needs to use a prioritized executor. Prioritized executors are weird, for instance they don't propagate rejections to `AbstractRunnable` tasks properly. This commit moves to using a regular scaling executor and removes some of the now-unnecessary workarounds for handling the prioritized executor's weirdness.

The changes introduced in #92021 mean that the master service no longer needs to use a prioritized executor. Prioritized executors are weird, for instance they don't propagate rejections to `AbstractRunnable` tasks properly. This commit moves to using a regular scaling executor and removes some of the now-unnecessary workarounds for handling the prioritized executor's weirdness.

The changes introduced in elastic#92021 mean that there is no need for the master service to block its thread while waiting for each publication to complete. This commit removes the now-unnecessary blocking. This commit also removes the now-unnecessary fake blocking in the `FakeThreadPoolMasterService` used in tests, bringing the implementation covered by tests closer to the production implementation.

The changes introduced in #92021 mean that there is no need for the master service to block its thread while waiting for each publication to complete. This commit removes the now-unnecessary blocking. This commit also removes the now-unnecessary fake blocking in the `FakeThreadPoolMasterService` used in tests, bringing the implementation covered by tests closer to the production implementation.

DaveCTurner added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.7.0 labels Nov 30, 2022

elasticsearchmachine added the Team:Distributed Meta label for distributed team (obsolete) label Nov 30, 2022

Update docs/changelog/92021.yaml

0370148

Update docs/changelog/92021.yaml

4cb1140

DaveCTurner commented Nov 30, 2022

View reviewed changes

DaveCTurner requested a review from henningandersen November 30, 2022 14:22

DaveCTurner mentioned this pull request Nov 30, 2022

Radix-sorted, lookup-free, typesafe, nonblocking, priority-boosting pending tasks #85751

Closed

Merge branch 'main' into 2022-11-30-new-master-service-queues

35b334d

Merge branch 'main' into 2022-11-30-new-master-service-queues

83ea0e7

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Dec 5, 2022

Add per-task description to pending cluster tasks

1d062b0

Deferring this until after elastic#92021 when there will be more typesafety and better tests.

henningandersen reviewed Dec 20, 2022

View reviewed changes

DaveCTurner added 8 commits December 20, 2022 13:31

Merge branch 'main' into TMP

0c88f4f

getTaskQueue -> createTaskQueue

e856717

Note that forkQueueProcessor is single-threaded

fc89fe3

CountedQueue -> PerPriorityQueue

0797f30

Correct comment

5f85269

Avoid cast with wildcard

b199b42

EnumMap doesn't need separate array

781da78

Unused

765a9d2

idegtiarenko reviewed Dec 20, 2022

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java Show resolved Hide resolved

DaveCTurner added 2 commits December 31, 2022 10:51

Merge branch 'main' into 2022-11-30-new-master-service-queues

d6a18d3

Update MetadataIndexAliasesService

9cfa0e7

DaveCTurner added 6 commits February 22, 2023 09:29

Fixup merge

c7f79bf

Simplify submitUnbatchedStateUpdateTask

bfccca2

Extract common allBatchesStream()

a22db27

Consistency assertions on currentlyExecutingBatch

76d99f4

Rename item -> batch

d565881

Merge branch 'main' into 2022-11-30-new-master-service-queues

ff22d79

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 23, 2023

Merge branch 'main' into 2022-11-30-new-master-service-queues

ed56710

elasticsearchmachine merged commit c058728 into elastic:main Feb 23, 2023

DaveCTurner deleted the 2022-11-30-new-master-service-queues branch February 23, 2023 13:01

DaveCTurner mentioned this pull request Mar 6, 2023

Deprioritize master service #94318

Merged

DaveCTurner mentioned this pull request Mar 6, 2023

Nonblocking master service #94325

Merged

DaveCTurner mentioned this pull request Mar 31, 2023

MasterService does not complete all tasks on shutdown #94930

Open

DiannaHohensee mentioned this pull request Nov 29, 2023

[CI] ClusterDisruptionIT testAckedIndexing failing #91447

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve master service batching queues #92021

Improve master service batching queues #92021

DaveCTurner commented Nov 30, 2022 •

edited

Loading

elasticsearchmachine commented Nov 30, 2022

elasticsearchmachine commented Nov 30, 2022

elasticsearchmachine commented Nov 30, 2022

DaveCTurner left a comment

DaveCTurner Nov 30, 2022

DaveCTurner Nov 30, 2022

DaveCTurner commented Nov 30, 2022

DaveCTurner commented Dec 1, 2022

henningandersen left a comment

Improve master service batching queues #92021

Improve master service batching queues #92021

Conversation

DaveCTurner commented Nov 30, 2022 • edited Loading

elasticsearchmachine commented Nov 30, 2022

elasticsearchmachine commented Nov 30, 2022

elasticsearchmachine commented Nov 30, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Nov 30, 2022

Choose a reason for hiding this comment

DaveCTurner Nov 30, 2022

Choose a reason for hiding this comment

DaveCTurner commented Nov 30, 2022

DaveCTurner commented Dec 1, 2022

henningandersen left a comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 30, 2022 •

edited

Loading