-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce repository UUIDs #67829
Introduce repository UUIDs #67829
Conversation
Today a snapshot repository does not have a well-defined identity. It can be reregistered with a different cluster under a different name, and can even be registered with multiple clusters in readonly mode. This presents problems for cases where we need to refer to a specific snapshot in a globally-unique fashion. Today we rely on the repository being registered under the same name on every cluster, but this is not a safe assumption. This commit adds a UUID that can be used to uniquely identify a repository. The UUID is stored in the top-level index blob, represented by `RepositoryData`, and is also usually copied into the `RepositoryMetadata` that represents the repository in the cluster state. The repository UUID is exposed in the get-repositories API; other more meaningful consumers will be added in due course.
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks just fine I think, just one stupid edge case left as far as I think?
verifyStep.whenComplete(ignored -> threadPool.generic().execute( | ||
ActionRunnable.wrap(getRepositoryDataStep, l -> repository(request.name()).getRepositoryData(l))), listener::onFailure); | ||
|
||
// When the repository metadata is ready, update the repository UUID stored in the cluster state, if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one slightly strange spot here for the case of read-only repositories.
You could have technically have a repo at some UUID and mount it to a cluster read-only. Then some outside force could wipe the repo completely (as in delete all the files) and write new RepositoryData
with a different UUID, in which case the uuid in the metadata won't ever get fixed. It's kind of a weird edge-case but we it's the reason why we don't track the repo generation in the cluster state when it comes to read-only repositories as well.
I guess in this case we explicitly want this UUID visible in read-only repos also so maybe we need some check on the UUID when loading RepositoryData
on read-only repos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh I see, ok, should be nbd since getRepositoryData
is already async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See b680eb3.
Also there is some unfortunate interaction here with the S3 repo cooldown enforcement tests in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/16657/testReport/junit/org.elasticsearch.repositories.s3/S3BlobStoreRepositoryTests/testEnforcedCooldownPeriod/ (also needs some work around for the repo data loading dance in some form since we fake downgrade the repo there). |
NBD I think, it works to set |
Ugh the docs tests are kinda broken, they don't clear the repo and we use the same path everywhere so it ends up mostly having a UUID 😕 Edit: they don't even clear the repo between runs, yech, it's the same UUID every run! |
#67841 will address the docs tests confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks !:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Failure looks unrelated, I opened #67884. @elasticmachine please run elasticsearch-ci/1 |
Today a snapshot repository does not have a well-defined identity. It can be reregistered with a different cluster under a different name, and can even be registered with multiple clusters in readonly mode. This presents problems for cases where we need to refer to a specific snapshot in a globally-unique fashion. Today we rely on the repository being registered under the same name on every cluster, but this is not a safe assumption. This commit adds a UUID that can be used to uniquely identify a repository. The UUID is stored in the top-level index blob, represented by `RepositoryData`, and is also usually copied into the `RepositoryMetadata` that represents the repository in the cluster state. The repository UUID is exposed in the get-repositories API; other more meaningful consumers will be added in due course. Backport of elastic#67829
Today a snapshot repository does not have a well-defined identity. It can be reregistered with a different cluster under a different name, and can even be registered with multiple clusters in readonly mode. This presents problems for cases where we need to refer to a specific snapshot in a globally-unique fashion. Today we rely on the repository being registered under the same name on every cluster, but this is not a safe assumption. This commit adds a UUID that can be used to uniquely identify a repository. The UUID is stored in the top-level index blob, represented by `RepositoryData`, and is also usually copied into the `RepositoryMetadata` that represents the repository in the cluster state. The repository UUID is exposed in the get-repositories API; other more meaningful consumers will be added in due course. Backport of #67829
This commit mostly reverts elastic#67934, except for the change to the version constant `REPOSITORY_UUID_IN_REPO_DATA_VERSION`. Completes the backport of elastic#67829 via elastic#67899
In elastic#67829 we introduced a `@Before` method to set up the repository for the docs tests. In fact this only needs to run once for the whole suite, not once per test. Relates elastic#67853
In elastic#67829 we introduced a `@Before` method to set up the repository for the docs tests. In fact this only needs to run once for the whole suite, not once per test. Relates elastic#67853
In #67829 we introduced a `@Before` method to set up the repository for the docs tests. In fact this only needs to run once for the whole suite, not once per test. Relates #67853 Co-authored-by: David Turner <[email protected]>
Today a snapshot repository does not have a well-defined global identity. The
closest we have is the repository name, but this is under the control of the
user and users are free to choose a different name each time they register the
repository.
This presents problems for cases where we need to refer to a specific snapshot
in a specific repository in a globally-unique fashion. Although snapshots
themselves have a globally-unique identifier, it is not feasible to search all
the available repositories for a specific snapshot. Today we expect the
repository to be registered under the same name on every cluster, but this is
not something on which we can rely.
To solve this, this commit adds a persistent UUID to each repository. The
repository UUID is stored in the top-level index blob, represented by
RepositoryData
, and is also usually copied into theRepositoryMetadata
thatrepresents the repository in the cluster state. The repository UUID is exposed
by the get-repositories API; other more meaningful consumers will be added in
due course.
Relates #66431