Split searchable snapshot into multiple repo operations #116918

DaveCTurner · 2024-11-18T09:08:23Z

Each operation on a snapshot repository uses the same Repository,
BlobStore, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which do
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.

Each operation on a snapshot repository uses the same `Repository`, `BlobStore`, etc. instances throughout, in order to avoid the complexity arising from handling metadata updates that occur while an operation is running. Today we model the entire lifetime of a searchable snapshot shard as a single repository operation since there should be no metadata updates that matter in this context (other than those that are handled dynamically via other mechanisms) and some metadata updates might be positively harmful to a searchable snapshot shard. It turns out that there are some undocumented legacy settings which _do_ matter to searchable snapshots, and which are still in use, so with this commit we move to a finer-grained model of repository operations within a searchable snapshot.

elasticsearchmachine · 2024-11-18T09:09:27Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

elasticsearchmachine · 2024-11-18T09:09:28Z

Hi @DaveCTurner, I've created a changelog YAML for you.

DaveCTurner · 2024-11-18T09:13:17Z

server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java

+                if (repositoryMetadata.name().equals(request.name())) {
+                    final RepositoryMetadata newRepositoryMetadata = new RepositoryMetadata(
+                        request.name(),
+                        repositoryMetadata.uuid(),


Here we copy the UUID from the previous repository instance rather than using _na_. The next time we load the RepositoryData we update the metadata if needed:

elasticsearch/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

Lines 2465 to 2475 in cef1b54

if (loaded.getUuid().equals(metadata.uuid())) {

listener.onResponse(loaded);

} else {

// someone switched the repo contents out from under us

RepositoriesService.updateRepositoryUuidInMetadata(

clusterService,

metadata.name(),

loaded,

new ThreadedActionListener<>(threadPool.generic(), listener.map(v -> loaded))

);

}

We could conceivably be stricter here, see #109936, but it doesn't seem necessary today. Instead note that in RepositorySupplier we'll notice the change in UUID and look for a different repository with matching UUID before eventually throwing a RepositoryMissingException.

I understand this is an optimisation when UUID does not change which should be most of the cases.

When the UUID does change, are you saying that

Copying the UUID only delays the need to load repositoryData.

The repositoryData will be loaded before any writing, e.g. createSnapshot, can happen so that UUID will become consistent.

It is not clear to me why a cached repositoryData would not be loaded in 2 and further delays the UUID consistency update?

The change somehow feels not belong here. But I may be too paranoid.

If we're updating settings that fundamentally change the underlying repository then org.elasticsearch.repositories.RepositoriesService#applyClusterState will create a brand-new Repository instance to replace the existing one (i.e. org.elasticsearch.repositories.RepositoriesService#canUpdateInPlace will return false) and this new instance will have no cached RepositoryData.

It's kind of an optimization but also kind of vital for the behaviour here. If we don't do this then we can't see that the new Repository instance is the one we should use for searchable snapshot operations in future (at least not without blocking some thread somewhere while waiting for the new UUID to be loaded).

create a brand-new Repository

Thanks! This is an important information that I originally missed.

If we don't do this then we can't see that the new Repository instance is the one we should use for searchable snapshot operations in future (at least not without blocking some thread somewhere while waiting for the new UUID to be loaded).

I see the point now. I guess that means searchable snapshot actions do not always load repo data as the first step? If the repo UUID did change, does that mean it would take a while before searchable snapshot related code realise it? Would that lead to any issue?

I guess that means searchable snapshot actions do not always load repo data as the first step?

Searchable snapshot actions essentially never load the RepositoryData. They already know how to find the shard data within the blob store (from the index.store.snapshot.index_uuid and index.store.snapshot.snapshot_uuid settings in the index metadata, and the shard ID). If the repo switches out from underneath them then they'll get exceptions indicating that the blobs they need are no longer found.

henningandersen

LGTM.

Would be good with a second review too.

ywangd

LGTM

I guess the changes may introduce a minor performance overhead due to looking up repository and shardContainer. But that seems to be in the nature of the requirement.

ywangd · 2024-11-18T11:19:30Z

server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java

+                if (repositoryMetadata.name().equals(request.name())) {
+                    final RepositoryMetadata newRepositoryMetadata = new RepositoryMetadata(
+                        request.name(),
+                        repositoryMetadata.uuid(),


I understand this is an optimisation when UUID does not change which should be most of the cases.

When the UUID does change, are you saying that

Copying the UUID only delays the need to load repositoryData.

The repositoryData will be loaded before any writing, e.g. createSnapshot, can happen so that UUID will become consistent.

It is not clear to me why a cached repositoryData would not be loaded in 2 and further delays the UUID consistency update?

The change somehow feels not belong here. But I may be too paranoid.

ywangd · 2024-11-18T11:22:04Z

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

+    static {
+        // these thread names must be aligned with those in :server
+        assert CACHE_FETCH_ASYNC_THREAD_POOL_NAME.equals(BlobStoreRepository.SEARCHABLE_SNAPSHOTS_CACHE_FETCH_ASYNC_THREAD_NAME);
+        assert CACHE_PREWARMING_THREAD_POOL_NAME.equals(BlobStoreRepository.SEARCHABLE_SNAPSHOTS_CACHE_PREWARMING_THREAD_NAME);
+    }


Nit: We can also actively assign them to be equal? eg:

CACHE_FETCH_ASYNC_THREAD_POOL_NAME = BlobStoreRepository.SEARCHABLE_SNAPSHOTS_CACHE_FETCH_ASYNC_THREAD_NAME;

ywangd · 2024-11-18T11:37:48Z

...hots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/store/RepositorySupplier.java

+
+        for (final Repository repository : repositoriesByName.values()) {
+            if (repository.getMetadata().uuid().equals(repositoryUuid)) {
+                repositoryNameHint = repository.getMetadata().name();


We could use a debug message here to indicate the actual repository name does not match?

ywangd · 2024-11-18T11:44:54Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/store/BlobContainerSupplier.java

+    }
+
+    private synchronized BlobContainer refreshAndGet() {
+        final LastKnownState lastKnownState = this.lastKnownState;


Is this local variable necessary? We are inside synchronized and this is the only place where this.lastKnownState can be updated?

ywangd · 2024-11-18T11:47:08Z

...org/elasticsearch/xpack/searchablesnapshots/s3/S3SearchableSnapshotsCredentialsReloadIT.java

+        return cluster.getHttpAddresses();
+    }
+
+    public void testReloadCredentialsFromKeystore() throws IOException {


May want to skip it when in fips.

ywangd · 2024-11-18T11:56:29Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/store/BlobContainerSupplier.java

+                currentRepository.shardContainer(indexId, shardId)
+            );
+            this.lastKnownState = new LastKnownState(currentRepository, newContainer);
+            return newContainer;


Similary, I wonder whether a logging could be useful here.

…itory

DaveCTurner · 2024-11-18T13:30:54Z

I guess the changes may introduce a minor performance overhead due to looking up repository and shardContainer.

Yeah but in practice if we're asking for the blobContainer() it's because we're about to read something from the blob store, with 10s of milliseconds of latency, so an extra map lookup and a few string comparisons should be lost in the noise.

…itory

elasticsearchmachine · 2024-11-18T17:34:38Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 116918

Each operation on a snapshot repository uses the same `Repository`, `BlobStore`, etc. instances throughout, in order to avoid the complexity arising from handling metadata updates that occur while an operation is running. Today we model the entire lifetime of a searchable snapshot shard as a single repository operation since there should be no metadata updates that matter in this context (other than those that are handled dynamically via other mechanisms) and some metadata updates might be positively harmful to a searchable snapshot shard. It turns out that there are some undocumented legacy settings which _do_ matter to searchable snapshots, and which are still in use, so with this commit we move to a finer-grained model of repository operations within a searchable snapshot. Backport of elastic#116918 to 8.x

Each operation on a snapshot repository uses the same `Repository`, `BlobStore`, etc. instances throughout, in order to avoid the complexity arising from handling metadata updates that occur while an operation is running. Today we model the entire lifetime of a searchable snapshot shard as a single repository operation since there should be no metadata updates that matter in this context (other than those that are handled dynamically via other mechanisms) and some metadata updates might be positively harmful to a searchable snapshot shard. It turns out that there are some undocumented legacy settings which _do_ matter to searchable snapshots, and which are still in use, so with this commit we move to a finer-grained model of repository operations within a searchable snapshot. Backport of elastic#116918 to 8.16

DaveCTurner · 2024-11-18T21:08:54Z

Backports:

Unfortunately I can't see a way to backport this to 7.17 safely, the test framework is vastly different there and lacks many of the features we need.

Each operation on a snapshot repository uses the same `Repository`, `BlobStore`, etc. instances throughout, in order to avoid the complexity arising from handling metadata updates that occur while an operation is running. Today we model the entire lifetime of a searchable snapshot shard as a single repository operation since there should be no metadata updates that matter in this context (other than those that are handled dynamically via other mechanisms) and some metadata updates might be positively harmful to a searchable snapshot shard. It turns out that there are some undocumented legacy settings which _do_ matter to searchable snapshots, and which are still in use, so with this commit we move to a finer-grained model of repository operations within a searchable snapshot. Backport of #116918 to 8.x

* Split searchable snapshot into multiple repo operations Each operation on a snapshot repository uses the same `Repository`, `BlobStore`, etc. instances throughout, in order to avoid the complexity arising from handling metadata updates that occur while an operation is running. Today we model the entire lifetime of a searchable snapshot shard as a single repository operation since there should be no metadata updates that matter in this context (other than those that are handled dynamically via other mechanisms) and some metadata updates might be positively harmful to a searchable snapshot shard. It turns out that there are some undocumented legacy settings which _do_ matter to searchable snapshots, and which are still in use, so with this commit we move to a finer-grained model of repository operations within a searchable snapshot. Backport of #116918 to 8.16 * Add end-to-end test for reloading S3 credentials We don't seem to have a test that completely verifies that a S3 repository can reload credentials from an updated keystore. This commit adds such a test. Backport of #116762 to 8.16.

Each operation on a snapshot repository uses the same `Repository`, `BlobStore`, etc. instances throughout, in order to avoid the complexity arising from handling metadata updates that occur while an operation is running. Today we model the entire lifetime of a searchable snapshot shard as a single repository operation since there should be no metadata updates that matter in this context (other than those that are handled dynamically via other mechanisms) and some metadata updates might be positively harmful to a searchable snapshot shard. It turns out that there are some undocumented legacy settings which _do_ matter to searchable snapshots, and which are still in use, so with this commit we move to a finer-grained model of repository operations within a searchable snapshot.

DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v9.0.0 v7.17.26 v8.16.1 v8.17.0 labels Nov 18, 2024

DaveCTurner requested a review from henningandersen November 18, 2024 09:08

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Nov 18, 2024

Update docs/changelog/116918.yaml

a1b19cf

DaveCTurner commented Nov 18, 2024

View reviewed changes

DaveCTurner requested a review from ywangd November 18, 2024 09:13

DaveCTurner added 2 commits November 18, 2024 10:03

Comment

333511a

before writing

8a31dcf

henningandersen approved these changes Nov 18, 2024

View reviewed changes

ywangd approved these changes Nov 18, 2024

View reviewed changes

DaveCTurner added 5 commits November 18, 2024 13:03

Merge branch 'main' into 2024/11/17/mutable-searchable-snapshot-repos…

e326a48

…itory

Skip tests in FIPS JVMs

f12848a

Just assign thread names directly

3a8e7eb

Inline lastKnownState

df424fa

Debug logs

a547e2c

DaveCTurner added the auto-backport Automatically create backport pull requests when merged label Nov 18, 2024

DaveCTurner enabled auto-merge (squash) November 18, 2024 13:31

DaveCTurner disabled auto-merge November 18, 2024 13:33

Merge branch 'main' into 2024/11/17/mutable-searchable-snapshot-repos…

4473e75

…itory

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 18, 2024

elasticsearchmachine merged commit 29bdae1 into elastic:main Nov 18, 2024
16 checks passed

elasticsearchmachine added the backport pending label Nov 18, 2024

DaveCTurner deleted the 2024/11/17/mutable-searchable-snapshot-repository branch November 18, 2024 17:34

DaveCTurner mentioned this pull request Nov 18, 2024

Split searchable snapshot into multiple repo operations #116986

Merged

DaveCTurner mentioned this pull request Nov 18, 2024

Split searchable snapshot into multiple repo operations #116987

Merged

DaveCTurner removed v7.17.26 backport pending labels Nov 18, 2024

DaveCTurner mentioned this pull request Nov 19, 2024

Add searchable snapshot test for reloading S3 credentials #116795

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split searchable snapshot into multiple repo operations #116918

Split searchable snapshot into multiple repo operations #116918

DaveCTurner commented Nov 18, 2024

elasticsearchmachine commented Nov 18, 2024

elasticsearchmachine commented Nov 18, 2024

DaveCTurner Nov 18, 2024

ywangd Nov 18, 2024

DaveCTurner Nov 18, 2024

ywangd Nov 19, 2024

DaveCTurner Nov 19, 2024

henningandersen left a comment

ywangd left a comment

ywangd Nov 18, 2024

ywangd Nov 18, 2024

ywangd Nov 18, 2024

ywangd Nov 18, 2024

ywangd Nov 18, 2024

ywangd Nov 18, 2024

DaveCTurner commented Nov 18, 2024

elasticsearchmachine commented Nov 18, 2024

DaveCTurner commented Nov 18, 2024

	if (loaded.getUuid().equals(metadata.uuid())) {
	listener.onResponse(loaded);
	} else {
	// someone switched the repo contents out from under us
	RepositoriesService.updateRepositoryUuidInMetadata(
	clusterService,
	metadata.name(),
	loaded,
	new ThreadedActionListener<>(threadPool.generic(), listener.map(v -> loaded))
	);
	}

Split searchable snapshot into multiple repo operations #116918

Split searchable snapshot into multiple repo operations #116918

Conversation

DaveCTurner commented Nov 18, 2024

elasticsearchmachine commented Nov 18, 2024

elasticsearchmachine commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

ywangd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 18, 2024

elasticsearchmachine commented Nov 18, 2024

💔 Backport failed

DaveCTurner commented Nov 18, 2024