-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include size of snapshot in snapshot metadata #29602
Include size of snapshot in snapshot metadata #29602
Conversation
Pinging @elastic/es-distributed |
Adds difference of number of files (and file sizes) between prev and current snapshot. Total number/size reflects total number/size of files in snapshot. Closes elastic#18543
b59c25b
to
6bcfcb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vladimirdolzhenko. While looking at the details of this PR, I've noticed that we don't need to serialize the total size / total number of files to disk, the information is already readily available (this will also simplify BWC). We can compute it in the constructor of BlobStoreIndexShardSnapshot based on the indexFiles that are provided as parameter to it.
For BWC purposes, we will then only have to worry about SnapshotStats. Here we have IndexingIt.testUpdateSnapshotStatus
that runs a snapshot in a mixed-version cluster, so ensures that nothing is horribly wrong with the BWC of the writeTo/readFrom logic.
In terms of naming, I think we should do the following changes:
- call the new field
incremental_size
instead ofdifference_of_size
. We already use the "incremental" terminology in the S/R docs. - call the new field
incremental_file_count
instead ofdifference_of_number_of_files
.
I would be even inclined to rename number_of_files
/total_number_of_files
to total_file_count
Finally, once we agree on the naming, we will also have to document this in the "breaking changes" documentation, and also add a note to the new change log.
@@ -549,7 +587,14 @@ public static BlobStoreIndexShardSnapshot fromXContent(XContentParser parser) th | |||
} | |||
} | |||
} | |||
// for the sake of backward compatibility | |||
if (totalNumberOfFiles < 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing between )
and {
here and a few lines below
} | ||
|
||
static final class Fields { | ||
static final String STATS = "stats"; | ||
static final String NUMBER_OF_FILES = "number_of_files"; | ||
static final String DIFFERENCE_OF_NUMBER_OF_FILES = "difference_of_number_of_files"; | ||
static final String TOTAL_NUMBER_OF_FILES = "total_number_of_files"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just renaming this field should break serialization as fromXContent
and toXContent
now use different field names.
@@ -84,6 +100,90 @@ protected void setUpRepository() throws Exception { | |||
ensureSearchable(); | |||
} | |||
|
|||
public void testSnapshotTotalAndDifferenceSizes() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test does not really fit into this class (it's about blocks). Can you move it to DedicatedClusterSnapshotRestoreIT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more comments.
The CI failure on the PR is also real, see the output:
[2018-04-20T01:06:08,507][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test {p0=snapshot.status/10_basic/Get snapshot status}]: after test
15:08:00 FAILURE 0.19s | MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.status/10_basic/Get snapshot status} <<< FAILURES!
15:08:00 > Throwable #1: java.lang.AssertionError: Failure at [snapshot.status/10_basic:29]: expected [2xx] status code but api [snapshot.status] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parse_exception","reason":"unknown parameter [incremental_file_count]","stack_trace":"ElasticsearchParseException[unknown parameter [incremental_file_count]]\n\tat org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot.fromXContent(BlobStoreIndexShardSnapshot.java:534)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:115)\n\tat org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:114)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.loadSnapshot(BlobStoreRepository.java:943)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:836)\n\tat org.elasticsearch.snapshots.SnapshotsService.snapshotShards(SnapshotsService.java:622)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:234)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:99)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:61)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:88)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:174)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"}],"type":"parse_exception","reason":"unknown parameter [incremental_file_count]","stack_trace":"ElasticsearchParseException[unknown parameter [incremental_file_count]]\n\tat org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot.fromXContent(BlobStoreIndexShardSnapshot.java:534)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:115)\n\tat org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:114)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.loadSnapshot(BlobStoreRepository.java:943)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:836)\n\tat org.elasticsearch.snapshots.SnapshotsService.snapshotShards(SnapshotsService.java:622)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:234)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:99)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:61)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:88)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:174)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"},"status":400}]
15:08:00 > at __randomizedtesting.SeedInfo.seed([44DD9D016CD36F30:CC89A2DBC22F02C8]:0)
15:08:00 > at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:343)
15:08:00 > at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:320)
15:08:00 > at java.lang.Thread.run(Thread.java:748)
15:08:00 > Caused by: java.lang.AssertionError: expected [2xx] status code but api [snapshot.status] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parse_exception","reason":"unknown parameter [incremental_file_count]","stack_trace":"ElasticsearchParseException[unknown parameter [incremental_file_count]]\n\tat org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot.fromXContent(BlobStoreIndexShardSnapshot.java:534)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:115)\n\tat org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:114)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.loadSnapshot(BlobStoreRepository.java:943)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:836)\n\tat org.elasticsearch.snapshots.SnapshotsService.snapshotShards(SnapshotsService.java:622)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:234)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:99)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:61)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:88)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:174)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"}],"type":"parse_exception","reason":"unknown parameter [incremental_file_count]","stack_trace":"ElasticsearchParseException[unknown parameter [incremental_file_count]]\n\tat org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot.fromXContent(BlobStoreIndexShardSnapshot.java:534)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:115)\n\tat org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:114)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.loadSnapshot(BlobStoreRepository.java:943)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:836)\n\tat org.elasticsearch.snapshots.SnapshotsService.snapshotShards(SnapshotsService.java:622)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:234)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:99)\n\tat org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:61)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:88)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:174)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"},"status":400}]
15:08:00 > at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:240)
15:08:00 > at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:336)
15:08:00 > ... 37 more
You might be able to reproduce this test failure by running ./gradlew :qa:mixed-cluster:v6.3.0-SNAPSHOT#mixedClusterTestRunner
It means that once a snapshot with version 7.0.0 has been written, version 6.3 will not be able to load information about that snapshot anymore (due to the BlobStoreIndexShardSnapshot
parse fields being renamed). I wonder if we should leave the internal serialization field names be, as it's not strictly needed to break this.
@@ -105,29 +126,45 @@ public void writeTo(StreamOutput out) throws IOException { | |||
out.writeVLong(startTime); | |||
out.writeVLong(time); | |||
|
|||
out.writeVInt(numberOfFiles); | |||
out.writeVInt(processedFiles); | |||
out.writeVInt(totalFileCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be the incrementalFileCount
?
out.writeVInt(numberOfFiles); | ||
out.writeVInt(processedFiles); | ||
out.writeVInt(totalFileCount); | ||
out.writeVInt(processedFileCount); | ||
|
||
out.writeVLong(totalSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be incrementalSize
now?
|
||
if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) { | ||
out.writeVInt(incrementalFileCount); | ||
out.writeVLong(incrementalSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be totalFileCount
and totalSize
here? (That's the new stats that we've added)
incrementalSize = in.readVLong(); | ||
} else { | ||
incrementalFileCount = totalFileCount; | ||
incrementalSize = totalSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments here as above.
int totalFileCount = 0; | ||
long totalSize = 0; | ||
for (FileInfo indexFile : indexFiles) { | ||
totalFileCount ++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just this.totalFileCount = indexFiles.size()
? Do we even need to store this in an extra field or can we just return indexFiles.size()
in the getter?
similarly, you can just write
this.totalSize = indexFiles.stream().mapToLong(idf -> idf.metadata().length()).sum();
try { | ||
return Files.size(f); | ||
} catch (IOException e) { | ||
throw new RuntimeException(e.getMessage(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just throw new UncheckedIOException(e)
. No need to pass the message as an extra parameter, it's just tests.
.execute() | ||
.actionGet(); | ||
|
||
final List<Path> files1 = scanSnapshotFolder(repoPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a more descriptive variable name here? For example filesForSnapshot2
} | ||
|
||
// create another snapshot and drop 1st one | ||
// total size has to grow, and has to be equal to files on fs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first part of this comment is not true.
|
||
SnapshotStats anotherStats = snapshots.get(0).getStats(); | ||
|
||
assertThat(anotherStats.getIncrementalFileCount(), equalTo(anotherStats.getProcessedFileCount())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also verify that the incremental file count / size makes sense? This could be established by looking at which files have been added on the filesystem after creating the second snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was not addressed, can you add more assertions here?
} | ||
|
||
/** | ||
* Returns total number of files that where snapshotted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more exact:
Returns total number of files that are referenced by this snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's hold off on merging until we have figured out the BWC implications.
When discussed this issue and decided to come up with BWC solution. Here is a suggestion of how we could do it while producing a semi-sane JSON in 6.x (bwc section can be removed in 7.0):
I also thought about a scheme where we distinguish between that files that have to be snapshotted vs files that have to be restored. I couldn't come up with good names with that scheme as snapshot means both a thing in the repository and the verb of copying files to the repository. |
@bleskes sounds good - but BWC section is not backward compatible with previous versions - I assume it would be better to keep it like this
|
Yeah, sorry. I copied the wrong section. |
added nested objects "incremental_files" and "all_files" + keep bwc part
added consistent names for incremental snapshot + keep bwc part of BlobStoreIndexShardSnapshot
@ywelsch i have couple questions to clarify with
|
@vladimirdolzhenko The internal storage format (BlobStoreIndexShardSnapshot) should not require any changes. I would just revert 7ab7262 |
@bleskes we have another proposal - processed stats goes to its own nested object - it's optional and goes away if snapshot is fully processed; timings goes out of incremental (like we have before), and rename all_files to total :
@ywelsch note: it is not possible to call nested object as Any objections ? |
|
||
Snapshot stats details are provided in a new structured way: | ||
|
||
* `total` section for all the files that are referenced by the snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dot at the end of sentence
|
||
* `total` section for all the files that are referenced by the snapshot | ||
* `incremental` section for those files that actually needed to be copied over as part of the incremental snapshotting. | ||
* In case of a snapshot that's still in progress, there's also a`processed` section for files that are in the process of being copied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space between a
and processed
builder.endObject(); | ||
return builder; | ||
builder.startObject(Fields.STATS) | ||
// incremental starts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
test this please |
@ywelsch could you please have another quick look into BWC REST API part |
test this please |
- do: | ||
snapshot.create: | ||
repository: test_repo_status_1 | ||
snapshot: test_snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the tests will pass when you create a snapshot with name test_snapshot
in this test and then another snapshot with same name in the other test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasonable point - but for some reason it passes locally - I think it would be better to use another name explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know why - ESClientYamlSuiteTestCase
uses each test scenario at the same isolation way as scenarios in a different yml files -
Line 167 in a17d6ca
for (Path yamlFile : yamlFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
* master: silence InstallPluginCommandTests, see #30900 Remove left-over comment Fix double semicolon in import statement [TEST] Fix minor random bug from #30794 Include size of snapshot in snapshot metadata #18543, bwc clean up (#30890) Enabling testing against an external cluster (#30885) Add public key header/footer (#30877) SQL: Remove the last remaining server dependencies from jdbc (#30771) Include size of snapshot in snapshot metadata (#29602) Do not serialize basic license exp in x-pack info (#30848) Change BWC version for VerifyRepositoryResponse (#30796) [DOCS] Document index name limitations (#30826) Harmonize include_defaults tests (#30700)
* 6.x: Fix double semicolon in import statement [TEST] Fix minor random bug from #30794 Enabling testing against an external cluster (#30885) SQL: Remove the last remaining server dependencies from jdbc (#30771) Add public key header/footer (#30877) Include size of snapshot in snapshot metadata (#29602) QA: Test template creation during rolling restart (#30850) REST high-level client: add put ingest pipeline API (#30793) Do not serialize basic license exp in x-pack info (#30848) [docs] explainer for java packaging tests (#30825) Verify signatures on official plugins (#30800) [DOCS] Document index name limitations (#30826) [Docs] Add reindex.remote.whitelist example (#30828)
@vladimirdolzhenko FYI |
thank you @jpountz, good catch, fixed |
Adds difference of number of files (and file sizes) between prev and current snapshot. Total number/size reflects total number/size of files in snapshot.
There are inconsistent properties - "number_of_files" but "total_size" and in fact they reflect incremental change. Align all stats property names and add effective total files properties.
To summarize the change:
stats (for op like
GET /_snapshot/backup1/snapshot_2/_status?human
) looksbefore:
after:
Note:
processed
is visible while snapshot is processing - it goes away when snapshot is done.