Parallelize stale index deletion #100316

DaveCTurner · 2023-10-05T05:04:50Z

After deleting a snapshot today we clean up all the now-dangling indices
sequentially, which can be rather slow. With this commit we parallelize
the work across the whole SNAPSHOT pool on the master node.

Closes #61513

Co-authored-by: Piyush Daftary [email protected]

After deleting a snapshot today we clean up all the now-dangling indices sequentially, which can be rather slow. With this commit we parallelize the work across the whole `SNAPSHOT` pool on the master node. Closes elastic#61513 Co-authored-by: Piyush Daftary <[email protected]>

elasticsearchmachine · 2023-10-05T05:05:14Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2023-10-05T05:05:14Z

Hi @DaveCTurner, I've created a changelog YAML for you.

DaveCTurner · 2023-10-05T05:44:14Z

@elasticmachine please run elasticsearch-ci/part-1

DaveCTurner · 2023-10-05T07:35:01Z

I'm slightly concerned about the lack of backpressure in this area, we could in theory end up with an ever-increasing pile of delete work in the queue; previously that would eventually have blocked all the snapshot threads but with this change even that doesn't happen any more. I've raised this point for discussion with the team.

DaveCTurner · 2023-10-05T08:05:47Z

One possible solution would be to preserve today's behaviour of always blocking at least one SNAPSHOT thread for each cleanup operation, so that falling behind would still eventually end up with all the SNAPSHOT threads processing deletes, slowing down subsequent snapshots. Obviously we don't just want to pause that thread, but we could use it to eagerly steal work from the queue:

diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
index fa699a76fadd..f7f9ed22efe4 100644
--- a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
+++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
@@ -147,6 +147,7 @@ import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.Executor;
 import java.util.concurrent.LinkedBlockingQueue;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import java.util.concurrent.atomic.AtomicReference;
 import java.util.function.Consumer;
@@ -1194,6 +1195,28 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
                 }));
             }
         }
+
+        // dedicate a single SNAPSHOT thread for this work, so that if we fall too far behind with deletes then eventually we stop taking
+        // snapshots too
+        threadPool.executor(ThreadPool.Names.SNAPSHOT).execute(new AbstractRunnable() {
+            @Override
+            protected void doRun() {
+                final AtomicBoolean isDone = new AtomicBoolean(true);
+                final Releasable ref = () -> isDone.set(true);
+                ActionListener<Releasable> nextTask;
+                while ((nextTask = staleBlobDeleteRunner.takeNextTask()) != null) {
+                    isDone.set(false);
+                    nextTask.onResponse(ref);
+                    assert isDone.get();
+                }
+            }
+
+            @Override
+            public void onFailure(Exception e) {
+                logger.error("unexpected failure while processing deletes on dedicated snapshot thread", e);
+                assert false : e;
+            }
+        });
     }

     /**

DaveCTurner · 2023-10-05T10:28:37Z

@elasticmachine please run elasticsearch-ci/part-2

ywangd

This PR has suprisingly rich context and interesting technical details. Thanks for the chance to review. My comments are mostly for my educational purpose. Thanks!

ywangd · 2023-10-05T08:50:08Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            final var survivingIndexIds = newRepoData.getIndices().values().stream().map(IndexId::getId).collect(Collectors.toSet());
+            for (final var indexEntry : foundIndices.entrySet()) {
+                final var indexSnId = indexEntry.getKey();
+                if (survivingIndexIds.contains(indexSnId)) {
+                    continue;
+                }
+                staleBlobDeleteRunner.enqueueTask(listeners.acquire(ref -> {
+                    try (ref) {
+                        logger.debug("[{}] Found stale index [{}]. Cleaning it up", metadata.name(), indexSnId);
+                        final var deleteResult = indexEntry.getValue().delete(OperationPurpose.SNAPSHOT);
+                        blobsDeleted.addAndGet(deleteResult.blobsDeleted());
+                        bytesDeleted.addAndGet(deleteResult.bytesDeleted());
+                        logger.debug("[{}] Cleaned up stale index [{}]", metadata.name(), indexSnId);
+                    } catch (IOException e) {
+                        logger.warn(() -> format("""
+                            [%s] index %s is no longer part of any snapshot in the repository, \
+                            but failed to clean up its index folder""", metadata.name(), indexSnId), e);
+                    }
+                }));


Do we want to keep this and above in their own separate methods as how they are now? Fewer nesting levels could be helpful?

I inlined these because we ended up having to pass really quite a lot of parameters in, and it wasn't really even a coherent set of parameters so much as just "the things needed to run this method". It's still less than one screenful (on my screen anyway) and nicely exposes the execution pattern, so tbh I prefer it as it is now.

This is actually kind of a code smell throughout this class (and the snapshot codebase more generally). I have been working on a refactoring that should help simplify things in this area and will open a followup in the next few days.

server/src/internalClusterTest/java/org/elasticsearch/snapshots/RepositoriesIT.java

ywangd · 2023-10-05T09:16:43Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/RepositoriesIT.java

@@ -295,4 +297,58 @@ public void testRepositoryConflict() throws Exception {
        logger.info("--> wait until snapshot deletion is finished");
        assertAcked(future.actionGet());
    }
+
+    public void testLeakedStaleIndicesAreDeletedBySubsequentDelete() throws Exception {


This is a cool test and I learnt a few tips and tricks from it. But it does not test the new parallelization change. Do we care?

This test is largely from the original contributor, but I think it's a reasonable test to write. I've played around with a few ideas for testing the new threading more precisely but it seems pretty tricky, and kinda doesn't matter so much as long as we do actually do the work somehow. I think I had an idea for a test tho.

Ok my idea seems to work, see testCleanupStaleBlobsConcurrency added in b56455c (sorry for the force-push)

ywangd · 2023-10-05T11:50:15Z

server/src/main/java/org/elasticsearch/common/util/concurrent/AbstractThrottledTaskRunner.java

+            if (isDone.get() == false) {
+                logger.error("runSyncTasksEagerly() was called on a queue [{}] containing an async task: [{}]", taskRunnerName, task);
+                assert false;


I think this is a best effort, right? A mis-behaving task can still fork but release the reference?

Yes indeed but that doesn't matter to us really. The ref is just so that the AbstractThrottledTaskRunner can track the relevant activities to completion. If we did fork an untracked task then presumably we're tracking it somewhere else.

ywangd · 2023-10-05T11:55:16Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        threadPool.executor(ThreadPool.Names.SNAPSHOT).execute(new AbstractRunnable() {
+            @Override
+            protected void doRun() {
+                staleBlobDeleteRunner.runSyncTasksEagerly();
+            }
+
+            @Override
+            public void onFailure(Exception e) {
+                logger.error("unexpected failure while processing deletes on dedicated snapshot thread", e);
+                assert false : e;
+            }
+
+            @Override
+            public void onRejection(Exception e) {
+                if (e instanceof EsRejectedExecutionException esre && esre.isExecutorShutdown()) {
+                    return;
+                }
+                super.onRejection(e);
+            }
+        });


Can we have this managed inside the task runner? It would be helpful for reuse.

Yes that's a good point. With only one caller it's hard to know where to put the abstraction boundary, but I can see value in allowing callers to pass in an Executor. They can always use EsExecutors.DIRECT_EXECUTOR_SERVICE if they really don't want to fork.

ywangd · 2023-10-05T12:11:01Z

...src/test/java/org/elasticsearch/common/util/concurrent/AbstractThrottledTaskRunnerTests.java

+        assertTrue(queue.isEmpty());
+        assertNoRunningTasks(taskRunner);


For my learning:

Why do we need assert the queue is empty here? We already asserted it has 0 size four lines above. I don't see how the queue size can increase after all tasks are enqueued?

In the existing assertNoRunningTasks, why is it necessary to spawn a batch of Runnables before verify runningTasks size is 0? We already verified that the queue is empty. It seems to me that just verify runningTasks is sufficient? Also running the extra Runnable does not guarantee that they actually excercise every thread in the pool. So not sure the purpose here.

Really we're just doing the same as the other tests here, verifying that the task runner is completely finished at the end of the test.

In assertNoRunningTasks() we need to wait for all the tasks spawned by the AbstractThrottledTaskRunner to completely finish before we can be sure the running task count is zero. That's because we call runningTasks.decrementAndGet() after completing each task, so when e.g. executedCountDown completes the count will not have reached zero yet. In order to make sure that the thread pool is completely free from all its current work, we enqueue N barrier tasks, which must therefore all be waiting on the barrier at the same time.

Put differently, if you remove that and run the tests in a loop then you should see them fail occasionally.

I'll add a comment.

Sorry for dragging on this point. But I think this threads flushing behaviour is not necessary for this test because the releasable is used in a try-with-resource block and should be released before executedCountDown finishes. This behaviour is needed for testEnqueueSpawnsNewTasksUpToMax because it explicitly closes the releasable after the countDown. If I flip the order, it also runs successfully 10K times without the thread flushing logic.

Ah hm I see what you mean now. Yes I think you're right we could simplify this if we made sure to release all the task refs before allowing that test to proceed.

ywangd

LGTM

The new concurrency test is pretty awesome 👍

ywangd · 2023-10-09T10:33:57Z

server/src/main/java/org/elasticsearch/common/util/concurrent/AbstractThrottledTaskRunner.java

+     * Run a single task on the given executor which eagerly pulls tasks from the queue and executes them. This must only be used if the
+     * tasks in the queue are all synchronous, i.e. they release their ref before returning from {@code onResponse()}.
+     */
+    public void runSyncTasksEagerly(Executor executor) {


Why not have an overload version of runSyncTasksEagerly() that just use the TaskRunner's own executor for the eager run as well?

Yes that could work too. I'm going to wait and see on that idea tho, I'd rather do one or the other, and the choice depends on whether we have other users that want to do something else or whether everyone just forks another task on AbstractThrottledTaskRunner#executor.

henningandersen

LGTM.

DaveCTurner · 2023-10-09T13:07:27Z

Thanks both!

DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.12.0 labels Oct 5, 2023

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 5, 2023

Update docs/changelog/100316.yaml

e13f1fb

DaveCTurner mentioned this pull request Oct 5, 2023

Speedup snapshot stale indices delete #64513

Closed

DaveCTurner requested a review from ywangd October 5, 2023 05:45

Rename

462af51

DaveCTurner added 2 commits October 5, 2023 09:45

Introduce runSyncTasksEagerly()

5062861

Reinstate backpressure

07f759b

DaveCTurner requested a review from henningandersen October 5, 2023 09:51

Rejections on shutdown are legit

f607f48

ywangd reviewed Oct 5, 2023

View reviewed changes

DaveCTurner added 5 commits October 5, 2023 14:19

Merge branch 'main' into 2023/10/05/stale-index-deletion-speedup

6101381

Add Executor parameter to runSyncTasksEagerly

2be9141

Comment about assertNoRunningTasks

a804b22

Comment is not necessary

97ed99e

Add testCleanupStaleBlobsConcurrency

b56455c

DaveCTurner force-pushed the 2023/10/05/stale-index-deletion-speedup branch from 0aef608 to b56455c Compare October 5, 2023 15:47

DaveCTurner added 4 commits October 5, 2023 16:53

Minor test cleanup

488e4bb

Numbers is numbers

bdff6e2

More commentary

4592ad1

Merge branch 'main' into 2023/10/05/stale-index-deletion-speedup

3673d40

ywangd approved these changes Oct 9, 2023

View reviewed changes

henningandersen approved these changes Oct 9, 2023

View reviewed changes

DaveCTurner merged commit cadcb9b into elastic:main Oct 9, 2023

DaveCTurner deleted the 2023/10/05/stale-index-deletion-speedup branch October 9, 2023 13:07

ywangd mentioned this pull request Oct 17, 2023

Throttle per-index snapshot deletes #100793

Merged

DaveCTurner restored the 2023/10/05/stale-index-deletion-speedup branch June 17, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize stale index deletion #100316

Parallelize stale index deletion #100316

DaveCTurner commented Oct 5, 2023

elasticsearchmachine commented Oct 5, 2023

elasticsearchmachine commented Oct 5, 2023

DaveCTurner commented Oct 5, 2023

DaveCTurner commented Oct 5, 2023

DaveCTurner commented Oct 5, 2023 •

edited

Loading

DaveCTurner commented Oct 5, 2023

ywangd left a comment

ywangd Oct 5, 2023

DaveCTurner Oct 5, 2023

DaveCTurner Oct 9, 2023

ywangd Oct 5, 2023

DaveCTurner Oct 5, 2023

DaveCTurner Oct 5, 2023

ywangd Oct 5, 2023

DaveCTurner Oct 5, 2023

ywangd Oct 5, 2023

DaveCTurner Oct 5, 2023

ywangd Oct 5, 2023

DaveCTurner Oct 5, 2023

ywangd Oct 5, 2023

DaveCTurner Oct 5, 2023

ywangd left a comment

ywangd Oct 9, 2023

DaveCTurner Oct 9, 2023 •

edited

Loading

henningandersen left a comment

DaveCTurner commented Oct 9, 2023

		assertTrue(queue.isEmpty());
		assertNoRunningTasks(taskRunner);

Parallelize stale index deletion #100316

Parallelize stale index deletion #100316

Conversation

DaveCTurner commented Oct 5, 2023

elasticsearchmachine commented Oct 5, 2023

elasticsearchmachine commented Oct 5, 2023

DaveCTurner commented Oct 5, 2023

DaveCTurner commented Oct 5, 2023

DaveCTurner commented Oct 5, 2023 • edited Loading

DaveCTurner commented Oct 5, 2023

ywangd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

DaveCTurner commented Oct 9, 2023

DaveCTurner commented Oct 5, 2023 •

edited

Loading

DaveCTurner Oct 9, 2023 •

edited

Loading