Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce repository UUIDs #67829

Merged
merged 10 commits into from
Jan 25, 2021

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Jan 21, 2021

Today a snapshot repository does not have a well-defined global identity. The
closest we have is the repository name, but this is under the control of the
user and users are free to choose a different name each time they register the
repository.

This presents problems for cases where we need to refer to a specific snapshot
in a specific repository in a globally-unique fashion. Although snapshots
themselves have a globally-unique identifier, it is not feasible to search all
the available repositories for a specific snapshot. Today we expect the
repository to be registered under the same name on every cluster, but this is
not something on which we can rely.

To solve this, this commit adds a persistent UUID to each repository. The
repository UUID is stored in the top-level index blob, represented by
RepositoryData, and is also usually copied into the RepositoryMetadata that
represents the repository in the cluster state. The repository UUID is exposed
by the get-repositories API; other more meaningful consumers will be added in
due course.

Relates #66431

Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.

This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.

This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 21, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks just fine I think, just one stupid edge case left as far as I think?

verifyStep.whenComplete(ignored -> threadPool.generic().execute(
ActionRunnable.wrap(getRepositoryDataStep, l -> repository(request.name()).getRepositoryData(l))), listener::onFailure);

// When the repository metadata is ready, update the repository UUID stored in the cluster state, if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one slightly strange spot here for the case of read-only repositories.
You could have technically have a repo at some UUID and mount it to a cluster read-only. Then some outside force could wipe the repo completely (as in delete all the files) and write new RepositoryData with a different UUID, in which case the uuid in the metadata won't ever get fixed. It's kind of a weird edge-case but we it's the reason why we don't track the repo generation in the cluster state when it comes to read-only repositories as well.

I guess in this case we explicitly want this UUID visible in read-only repos also so maybe we need some check on the UUID when loading RepositoryData on read-only repos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh I see, ok, should be nbd since getRepositoryData is already async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See b680eb3.

@original-brownbear
Copy link
Member

Also there is some unfortunate interaction here with the S3 repo cooldown enforcement tests in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/16657/testReport/junit/org.elasticsearch.repositories.s3/S3BlobStoreRepositoryTests/testEnforcedCooldownPeriod/ (also needs some work around for the repo data loading dance in some form since we fake downgrade the repo there).

@DaveCTurner
Copy link
Contributor Author

Also there is some unfortunate interaction here with the S3 repo cooldown enforcement tests

NBD I think, it works to set ?verify=false.

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Jan 21, 2021

Ugh the docs tests are kinda broken, they don't clear the repo and we use the same path everywhere so it ends up mostly having a UUID 😕

Edit: they don't even clear the repo between runs, yech, it's the same UUID every run!

@DaveCTurner
Copy link
Contributor Author

#67841 will address the docs tests confusion.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !:)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner
Copy link
Contributor Author

Failure looks unrelated, I opened #67884.

@elasticmachine please run elasticsearch-ci/1

@DaveCTurner DaveCTurner merged commit e5a15d4 into elastic:master Jan 25, 2021
@DaveCTurner DaveCTurner deleted the 2021-01-21-repository-uuid branch January 25, 2021 12:17
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jan 25, 2021
Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.

This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.

This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.

Backport of elastic#67829
DaveCTurner added a commit that referenced this pull request Jan 25, 2021
Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.

This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.

This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.

Backport of #67829
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jan 25, 2021
This commit mostly reverts elastic#67934, except for the change to the version
constant `REPOSITORY_UUID_IN_REPO_DATA_VERSION`.

Completes the backport of elastic#67829 via elastic#67899
DaveCTurner added a commit that referenced this pull request Jan 25, 2021
This commit mostly reverts #67934, except for the change to the version
constant `REPOSITORY_UUID_IN_REPO_DATA_VERSION`.

Completes the backport of #67829 via #67899
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jan 26, 2021
In elastic#67829 we introduced a `@Before` method to set up the repository for
the docs tests. In fact this only needs to run once for the whole suite,
not once per test.

Relates elastic#67853
original-brownbear pushed a commit that referenced this pull request Jan 26, 2021
In #67829 we introduced a `@Before` method to set up the repository for
the docs tests. In fact this only needs to run once for the whole suite,
not once per test.

Relates #67853
original-brownbear pushed a commit to original-brownbear/elasticsearch that referenced this pull request Jan 26, 2021
In elastic#67829 we introduced a `@Before` method to set up the repository for
the docs tests. In fact this only needs to run once for the whole suite,
not once per test.

Relates elastic#67853
original-brownbear added a commit that referenced this pull request Jan 26, 2021
In #67829 we introduced a `@Before` method to set up the repository for
the docs tests. In fact this only needs to run once for the whole suite,
not once per test.

Relates #67853

Co-authored-by: David Turner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants