Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve master service batching queues #92021

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Nov 30, 2022

Today the master's pending task queue is just the PriorityBlockingQueue<Runnable> belonging to the underlying ThreadPoolExecutor. The reasons for this date back a long way but it doesn't really reflect the structure of the queue as it exists today. In particular, we must keep track of batches independently of the queue itself, and must do various bits of unchecked casting to process multiple items of the same type at once.

This commit introduces an new queueing mechanism, independent of the executor's queue, which better represents the conceptual structure of the master's pending tasks:

  • Today we use a priority queue to allow important tasks to preempt less-important ones. However there are only a small number of priority levels, so it is simpler to maintain a queue for each priority, effectively replacing the sorting within the priority queue with a radix sort.

  • Today when a task is submitted we perform a map lookup to see if it can be added to an existing batch or not. With this change we allow client code to create its own dedicated queue of tasks. The entries in the per-priority-level queues are themselves queues, one for each executor, representing the batches to be run.

  • Today each task in the queue holds a reference to its executor, but the executor used to run a task may belong to a different task in the same batch. In practice we know they're the same executor (that's how batches are defined) but we cannot express this knowledge in the type system so we have to do a bunch of unchecked casting to work around it. With this change we associate each per-executor queue directly with its executor, avoiding the need to do all this unchecked casting.

  • Today the master service must block its thread while waiting for each task to complete, because otherwise the executor would start to process the next task in the queue. This makes testing using a DeterministicTaskQueue harder (see FakeThreadPoolMasterService). This change avoids enqueueing tasks on the ThreadPoolExecutor unless there is genuinely work to do, although it leaves the removal of the actual blocking to a followup.

Closes #81626

Today the master's pending task queue is just the
`PriorityBlockingQueue<Runnable>` belonging to the underlying
`ThreadPoolExecutor`. The reasons for this date back a long way but it
doesn't really reflect the structure of the queue as it exists today. In
particular, we must keep track of batches independently of the queue
itself, and must do various bits of unchecked casting to process
multiple items of the same type at once.

This commit introduces an new queueing mechanism, independent of the
executor's queue, which better represents the conceptual structure of
the master's pending tasks:

* Today we use a priority queue to allow important tasks to preempt
less-important ones. However there are only a small number of priority
levels, so it is simpler to maintain a queue for each priority,
effectively replacing the sorting within the priority queue with a radix
sort.

* Today when a task is submitted we perform a map lookup to see if it
can be added to an existing batch or not. With this change we allow
client code to create its own dedicated queue of tasks. The entries in
the per-priority-level queues are themselves queues, one for each
executor, representing the batches to be run.

* Today each task in the queue holds a reference to its executor, but
the executor used to run a task may belong to a different task in the
same batch. In practice we know they're the same executor (that's how
batches are defined) but we cannot express this knowledge in the type
system so we have to do a bunch of unchecked casting to work around it.
With this change we associate each per-executor queue directly with its
executor, avoiding the need to do all this unchecked casting.

* Today the master service must block its thread while waiting for each
task to complete, because otherwise the executor would start to process
the next task in the queue. This makes testing using a
`DeterministicTaskQueue` harder (see `FakeThreadPoolMasterService`).
This change avoids enqueueing tasks on the `ThreadPoolExecutor` unless
there is genuinely work to do, although it leaves the removal of the
actual blocking to a followup.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.7.0 labels Nov 30, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team (obsolete) label Nov 30, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've updated the changelog YAML for you.

Copy link
Contributor Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaningful part of this change is in MasterService and its tests.

Almost all of the other changes here are mechanically replacing clusterService.submitStateUpdateTask with taskQueue.submitTask on a suitably-constructed taskQueue.

It's (slightly) nicer if you ignore whitespace changes.

I've left some other notes for reviewers inline.

Comment on lines +178 to +180
|| e instanceof FailedToCommitClusterStateException
&& e.getCause()instanceof EsRejectedExecutionException esre
&& esre.isExecutorShutdown();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today's MasterService silently drops stuff when it's shut down. With this change we become stricter about rejecting work explicitly in this case.

@@ -46,12 +45,10 @@ public PrioritizedEsThreadPoolExecutor(
TimeUnit unit,
ThreadFactory threadFactory,
ThreadContext contextHolder,
ScheduledExecutorService timer,
StarvationWatcher starvationWatcher
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The starvation-watching mechanism is now built into the MasterService itself, not the executor, so this parameter is unnecessary and removed everywhere.

@DaveCTurner
Copy link
Contributor Author

@elasticmachine update branch

@DaveCTurner
Copy link
Contributor Author

@elasticmachine update branch

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Dec 5, 2022
Deferring this until after elastic#92021 when there will be more typesafety and
better tests.
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial read of the MasterService and wanted to relay my comments now to ensure alignment before going in details and reading tests.

Overall, this change looks great.

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 23, 2023
@elasticsearchmachine elasticsearchmachine merged commit c058728 into elastic:main Feb 23, 2023
@DaveCTurner DaveCTurner deleted the 2022-11-30-new-master-service-queues branch February 23, 2023 13:01
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 3, 2023
The changes introduced in elastic#92021 mean that there is no need for the
master service to block its thread while waiting for each publication to
complete. This commit removes the now-unnecessary blocking.

This commit also removes the now-unnecessary fake blocking in the
`FakeThreadPoolMasterService` used in tests, bringing the implementation
covered by tests closer to the production implementation.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 4, 2023
The changes introduced in elastic#92021 mean that the master service no longer
needs to use a prioritized executor. Prioritized executors are weird,
for instance they don't propagate rejections to `AbstractRunnable` tasks
properly. This commit moves to using a regular scaling executor and
removes some of the now-unnecessary workarounds for handling the
prioritized executor's weirdness.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 6, 2023
The changes introduced in elastic#92021 mean that the master service no longer
needs to use a prioritized executor. Prioritized executors are weird,
for instance they don't propagate rejections to `AbstractRunnable` tasks
properly. This commit moves to using a regular scaling executor and
removes some of the now-unnecessary workarounds for handling the
prioritized executor's weirdness.
DaveCTurner added a commit that referenced this pull request Mar 6, 2023
The changes introduced in #92021 mean that the master service no longer
needs to use a prioritized executor. Prioritized executors are weird,
for instance they don't propagate rejections to `AbstractRunnable` tasks
properly. This commit moves to using a regular scaling executor and
removes some of the now-unnecessary workarounds for handling the
prioritized executor's weirdness.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 6, 2023
The changes introduced in elastic#92021 mean that there is no need for the
master service to block its thread while waiting for each publication to
complete. This commit removes the now-unnecessary blocking.

This commit also removes the now-unnecessary fake blocking in the
`FakeThreadPoolMasterService` used in tests, bringing the implementation
covered by tests closer to the production implementation.
elasticsearchmachine pushed a commit that referenced this pull request Mar 30, 2023
The changes introduced in #92021 mean that there is no need for the
master service to block its thread while waiting for each publication to
complete. This commit removes the now-unnecessary blocking.

This commit also removes the now-unnecessary fake blocking in the
`FakeThreadPoolMasterService` used in tests, bringing the implementation
covered by tests closer to the production implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed Meta label for distributed team (obsolete) v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pending task batching can be a bottleneck
6 participants