-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhances get snapshots API to allow retrieving repository index only #24477
Conversation
final List<SnapshotInfo> currentSnapshots) { | ||
List<SnapshotInfo> snapshotInfos = new ArrayList<>(); | ||
for (SnapshotInfo snapshotInfo : currentSnapshots) { | ||
if (toResolve.contains(snapshotInfo.snapshotId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea:
if (toResolve.remove(snapshotInfo.snapshotId())) {
snapshotInfos.add(snapshotInfo.basic());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @StefanSchmidtOz, i made the suggested change
8091d03
to
e2a6633
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. One concern that I have is that the state is such an important part of the snapshot information that omitting state can lead to the situation when a user might thing that they have the latest snapshot for a particular index, while this snapshot actually failed. Could we add this information to the index file? What do you think?
Since we are modifying toXContent, I think it might be useful to add another get snapshot info test with verbose=false to rest tests. There is also some non-trivial bwc work going on here. So, I think having a bwc test might be helpful as well.
@imotov I looked at adding the state to the index blob and I believe this will be relatively straight forward, and I agree with you its only a little bit of added information to the index blob, but the benefits are great as state is the most important thing one would care about for a given snapshot. I will work on adding this and push commits, and will add the tests you suggested as well. |
e2a6633
to
baa2055
Compare
Looks good! Any chance we can add reading non-verbose listings to one of snapshot/restore bwc tests? |
@imotov the different repository index formats are implicitly executed in the backwards-5.0 and rolling upgrade tests - since we have different versioned nodes, with some of them on the old format, sometimes we get nodes that read/write the old format and sometimes we get nodes that read/write the new format. In fact, running those tests helped me find a bug in reading the old format from a new node. I can add a test which just writes the old x-content and reads it in the newer version, and vice versa (a unit test where I'd be copying/pasting the old code in the test to test old serialization/deserialization with the new). |
retest this please |
@imotov i added the old snapshot bwc tests in 9ca8c06e635d3e246c70a288e43a0525f37695cf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
This is going only to 6.0, right? Should we remove 5.5 label? |
@imotov I thought we could backport it too 5.5 too, do you feel its too risky? |
retest this please |
@abeyad I don't think it's too risky. I just noticed this
but I guess you did it temporary to make sure BWC tests pass. |
Yes exactly, once I backport the change to 5.5, I will change the constant as well on both 5.5 and master. |
Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all) provides information about snapshots in the repository, including the snapshot state, number of shards snapshotted, failures, etc. In order to provide information about each snapshot in the repository, the call must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for every snapshot. In cloud-based repositories, this can be expensive, both from a cost and performance perspective. Sometimes, all the user wants is to retrieve all the names/uuids of each snapshot, and the indices that went into each snapshot, without any of the other status information about the snapshot. This minimal information can be retrieved from the repository index blob (`index-N`) without needing to read each snapshot metadata blob. This commit enhances the get snapshots API with an optional `verbose` parameter. If `verbose` is set to false on the request, then the get snapshots API will only retrieve the minimal information about each snapshot (the name, uuid, and indices in the snapshot), and only read this information from the repository index blob, thereby giving users the option to retrieve the snapshots in a repository in a more cost-effective and efficient manner. Closes elastic#24288
9ca8c06
to
c37d9b8
Compare
…24477) Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all) provides information about snapshots in the repository, including the snapshot state, number of shards snapshotted, failures, etc. In order to provide information about each snapshot in the repository, the call must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for every snapshot. In cloud-based repositories, this can be expensive, both from a cost and performance perspective. Sometimes, all the user wants is to retrieve all the names/uuids of each snapshot, and the indices that went into each snapshot, without any of the other status information about the snapshot. This minimal information can be retrieved from the repository index blob (`index-N`) without needing to read each snapshot metadata blob. This commit enhances the get snapshots API with an optional `verbose` parameter. If `verbose` is set to false on the request, then the get snapshots API will only retrieve the minimal information about each snapshot (the name, uuid, and indices in the snapshot), and only read this information from the repository index blob, thereby giving users the option to retrieve the snapshots in a repository in a more cost-effective and efficient manner. Closes #24288
5.x commit: 8fc3f1c |
thanks for the review @imotov |
In elastic#24477, a less verbose option was added to retrieve snapshot info via GET /_snapshot/{repo}/{snapshots}. The point of adding this less verbose option was so that if the repository is a cloud based one, and there are many snapshots for which the snapshot info needed to be retrieved, then each snapshot would require reading a separate snapshot metadata file to pull out the necessary information. This can be costly (performance and cost) on cloud based repositories, so a less verbose option was added that only retrieves very basic information about each snapshot that is all available in the index-N blob - requiring only one read! In order to display this less verbose snapshot info appropriately, logic was added to not display those fields which could not be populated. However, this broke integrators (e.g. ECE) that required these fields to be present, even if empty. This commit is to return these fields in the response, even if empty, if the verbose option is set.
In #24477, a less verbose option was added to retrieve snapshot info via GET /_snapshot/{repo}/{snapshots}. The point of adding this less verbose option was so that if the repository is a cloud based one, and there are many snapshots for which the snapshot info needed to be retrieved, then each snapshot would require reading a separate snapshot metadata file to pull out the necessary information. This can be costly (performance and cost) on cloud based repositories, so a less verbose option was added that only retrieves very basic information about each snapshot that is all available in the index-N blob - requiring only one read! In order to display this less verbose snapshot info appropriately, logic was added to not display those fields which could not be populated. However, this broke integrators (e.g. ECE) that required these fields to be present, even if empty. This commit is to return these fields in the response, even if empty, if the verbose option is set.
In #24477, a less verbose option was added to retrieve snapshot info via GET /_snapshot/{repo}/{snapshots}. The point of adding this less verbose option was so that if the repository is a cloud based one, and there are many snapshots for which the snapshot info needed to be retrieved, then each snapshot would require reading a separate snapshot metadata file to pull out the necessary information. This can be costly (performance and cost) on cloud based repositories, so a less verbose option was added that only retrieves very basic information about each snapshot that is all available in the index-N blob - requiring only one read! In order to display this less verbose snapshot info appropriately, logic was added to not display those fields which could not be populated. However, this broke integrators (e.g. ECE) that required these fields to be present, even if empty. This commit is to return these fields in the response, even if empty, if the verbose option is set.
In #24477, a less verbose option was added to retrieve snapshot info via GET /_snapshot/{repo}/{snapshots}. The point of adding this less verbose option was so that if the repository is a cloud based one, and there are many snapshots for which the snapshot info needed to be retrieved, then each snapshot would require reading a separate snapshot metadata file to pull out the necessary information. This can be costly (performance and cost) on cloud based repositories, so a less verbose option was added that only retrieves very basic information about each snapshot that is all available in the index-N blob - requiring only one read! In order to display this less verbose snapshot info appropriately, logic was added to not display those fields which could not be populated. However, this broke integrators (e.g. ECE) that required these fields to be present, even if empty. This commit is to return these fields in the response, even if empty, if the verbose option is set.
A NullPointerException is thrown when trying to create or delete a snapshot in a repository that has been used by a pre-5.5 cluster and a post-5.5 cluster. The way snapshots are formatted in the repository snapshots index file changed in elastic#24477 and it can cause a NPE if a very specific set of operations is written in the repository.
A NullPointerException is thrown when trying to create or delete a snapshot in a repository that has been written to by an older Elasticsearch after writing to it with a newer Elasticsearch version. This is because the way snapshots are formatted in the repository snapshots index file changed in #24477. This commit changes the parsing of the repository index file so that it now detects a corrupted index file and fails early the snapshot operation. closes #29052
A NullPointerException is thrown when trying to create or delete a snapshot in a repository that has been written to by an older Elasticsearch after writing to it with a newer Elasticsearch version. This is because the way snapshots are formatted in the repository snapshots index file changed in #24477. This commit changes the parsing of the repository index file so that it now detects a corrupted index file and fails early the snapshot operation. closes #29052
Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc. In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (
snap-{snapshot_uuid}.dat
) forevery snapshot. In cloud-based repositories, this can be expensive,
both from a cost and performance perspective. Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot. This minimal information can be
retrieved from the repository index blob (
index-N
) without needing toread each snapshot metadata blob.
This commit enhances the get snapshots API with an optional
verbose
parameter. If
verbose
is set to false on the request, then the getsnapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.
Closes #24288