Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split searchable snapshot into multiple repo operations #116918

Conversation

DaveCTurner
Copy link
Contributor

Each operation on a snapshot repository uses the same Repository,
BlobStore, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which do
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.

Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

if (repositoryMetadata.name().equals(request.name())) {
final RepositoryMetadata newRepositoryMetadata = new RepositoryMetadata(
request.name(),
repositoryMetadata.uuid(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we copy the UUID from the previous repository instance rather than using _na_. The next time we load the RepositoryData we update the metadata if needed:

if (loaded.getUuid().equals(metadata.uuid())) {
listener.onResponse(loaded);
} else {
// someone switched the repo contents out from under us
RepositoriesService.updateRepositoryUuidInMetadata(
clusterService,
metadata.name(),
loaded,
new ThreadedActionListener<>(threadPool.generic(), listener.map(v -> loaded))
);
}

We could conceivably be stricter here, see #109936, but it doesn't seem necessary today. Instead note that in RepositorySupplier we'll notice the change in UUID and look for a different repository with matching UUID before eventually throwing a RepositoryMissingException.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is an optimisation when UUID does not change which should be most of the cases.

When the UUID does change, are you saying that

  1. Copying the UUID only delays the need to load repositoryData.
  2. The repositoryData will be loaded before any writing, e.g. createSnapshot, can happen so that UUID will become consistent.

It is not clear to me why a cached repositoryData would not be loaded in 2 and further delays the UUID consistency update?

The change somehow feels not belong here. But I may be too paranoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're updating settings that fundamentally change the underlying repository then org.elasticsearch.repositories.RepositoriesService#applyClusterState will create a brand-new Repository instance to replace the existing one (i.e. org.elasticsearch.repositories.RepositoriesService#canUpdateInPlace will return false) and this new instance will have no cached RepositoryData.

It's kind of an optimization but also kind of vital for the behaviour here. If we don't do this then we can't see that the new Repository instance is the one we should use for searchable snapshot operations in future (at least not without blocking some thread somewhere while waiting for the new UUID to be loaded).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a brand-new Repository

Thanks! This is an important information that I originally missed.

If we don't do this then we can't see that the new Repository instance is the one we should use for searchable snapshot operations in future (at least not without blocking some thread somewhere while waiting for the new UUID to be loaded).

I see the point now. I guess that means searchable snapshot actions do not always load repo data as the first step? If the repo UUID did change, does that mean it would take a while before searchable snapshot related code realise it? Would that lead to any issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that means searchable snapshot actions do not always load repo data as the first step?

Searchable snapshot actions essentially never load the RepositoryData. They already know how to find the shard data within the blob store (from the index.store.snapshot.index_uuid and index.store.snapshot.snapshot_uuid settings in the index metadata, and the shard ID). If the repo switches out from underneath them then they'll get exceptions indicating that the blobs they need are no longer found.

@DaveCTurner DaveCTurner requested a review from ywangd November 18, 2024 09:13
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Would be good with a second review too.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I guess the changes may introduce a minor performance overhead due to looking up repository and shardContainer. But that seems to be in the nature of the requirement.

if (repositoryMetadata.name().equals(request.name())) {
final RepositoryMetadata newRepositoryMetadata = new RepositoryMetadata(
request.name(),
repositoryMetadata.uuid(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is an optimisation when UUID does not change which should be most of the cases.

When the UUID does change, are you saying that

  1. Copying the UUID only delays the need to load repositoryData.
  2. The repositoryData will be loaded before any writing, e.g. createSnapshot, can happen so that UUID will become consistent.

It is not clear to me why a cached repositoryData would not be loaded in 2 and further delays the UUID consistency update?

The change somehow feels not belong here. But I may be too paranoid.

Comment on lines 551 to 555
static {
// these thread names must be aligned with those in :server
assert CACHE_FETCH_ASYNC_THREAD_POOL_NAME.equals(BlobStoreRepository.SEARCHABLE_SNAPSHOTS_CACHE_FETCH_ASYNC_THREAD_NAME);
assert CACHE_PREWARMING_THREAD_POOL_NAME.equals(BlobStoreRepository.SEARCHABLE_SNAPSHOTS_CACHE_PREWARMING_THREAD_NAME);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can also actively assign them to be equal? eg:

CACHE_FETCH_ASYNC_THREAD_POOL_NAME = BlobStoreRepository.SEARCHABLE_SNAPSHOTS_CACHE_FETCH_ASYNC_THREAD_NAME;


for (final Repository repository : repositoriesByName.values()) {
if (repository.getMetadata().uuid().equals(repositoryUuid)) {
repositoryNameHint = repository.getMetadata().name();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a debug message here to indicate the actual repository name does not match?

}

private synchronized BlobContainer refreshAndGet() {
final LastKnownState lastKnownState = this.lastKnownState;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this local variable necessary? We are inside synchronized and this is the only place where this.lastKnownState can be updated?

return cluster.getHttpAddresses();
}

public void testReloadCredentialsFromKeystore() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to skip it when in fips.

currentRepository.shardContainer(indexId, shardId)
);
this.lastKnownState = new LastKnownState(currentRepository, newContainer);
return newContainer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similary, I wonder whether a logging could be useful here.

@DaveCTurner DaveCTurner added the auto-backport Automatically create backport pull requests when merged label Nov 18, 2024
@DaveCTurner
Copy link
Contributor Author

I guess the changes may introduce a minor performance overhead due to looking up repository and shardContainer.

Yeah but in practice if we're asking for the blobContainer() it's because we're about to read something from the blob store, with 10s of milliseconds of latency, so an extra map lookup and a few string comparisons should be lost in the noise.

@DaveCTurner DaveCTurner enabled auto-merge (squash) November 18, 2024 13:31
@DaveCTurner DaveCTurner disabled auto-merge November 18, 2024 13:33
@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 18, 2024
@elasticsearchmachine elasticsearchmachine merged commit 29bdae1 into elastic:main Nov 18, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 116918

@DaveCTurner DaveCTurner deleted the 2024/11/17/mutable-searchable-snapshot-repository branch November 18, 2024 17:34
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Nov 18, 2024
Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.

Backport of elastic#116918 to 8.x
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Nov 18, 2024
Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.

Backport of elastic#116918 to 8.16
@DaveCTurner
Copy link
Contributor Author

Backports:

Unfortunately I can't see a way to backport this to 7.17 safely, the test framework is vastly different there and lacks many of the features we need.

DaveCTurner added a commit that referenced this pull request Nov 19, 2024
Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.

Backport of #116918 to 8.x
elasticsearchmachine pushed a commit that referenced this pull request Nov 19, 2024
* Split searchable snapshot into multiple repo operations

Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.

Backport of #116918 to 8.16

* Add end-to-end test for reloading S3 credentials

We don't seem to have a test that completely verifies that a S3
repository can reload credentials from an updated keystore. This commit
adds such a test.

Backport of #116762 to 8.16.
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Nov 20, 2024
Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
Each operation on a snapshot repository uses the same `Repository`,
`BlobStore`, etc. instances throughout, in order to avoid the complexity
arising from handling metadata updates that occur while an operation is
running. Today we model the entire lifetime of a searchable snapshot
shard as a single repository operation since there should be no metadata
updates that matter in this context (other than those that are handled
dynamically via other mechanisms) and some metadata updates might be
positively harmful to a searchable snapshot shard.

It turns out that there are some undocumented legacy settings which _do_
matter to searchable snapshots, and which are still in use, so with this
commit we move to a finer-grained model of repository operations within
a searchable snapshot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v8.16.1 v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants