Don't Upload Redundant Shard Files #51729

original-brownbear · 2020-01-31T09:20:35Z

Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
physically loading this information as well.

Segment(s) info blobs are already stored with their full content in the "hash" field in the shard snapshot metadata as long as they are smaller than 1MB. We can make use of this fact and never upload them physically to the repo. This saves a non-trivial number of uploads and downloads when restoring and might also lower the latency of searchable snapshots since they can save phyiscally loading this information as well.

elasticmachine · 2020-01-31T09:20:37Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2020-01-31T13:39:56Z

server/src/test/java/org/elasticsearch/snapshots/SharedClusterSnapshotRestoreIT.java

@@ -1035,7 +1035,8 @@ public void testUnrestorableFilesDuringRestore() throws Exception {
        final String indexName = "unrestorable-files";
        final int maxRetries = randomIntBetween(1, 10);

-        Settings createIndexSettings = Settings.builder().put(SETTING_ALLOCATION_MAX_RETRY.getKey(), maxRetries).build();
+        Settings createIndexSettings = Settings.builder().put(SETTING_ALLOCATION_MAX_RETRY.getKey(), maxRetries)


It's a little annoying but if we run this test with more than a single shard, then it might be that we get shards that aren't corrupted because they contain no documents now as well as no physical data files because only the .si and segments_N are uploaded for the empty shards.

original-brownbear · 2020-01-31T13:43:08Z

server/src/test/java/org/elasticsearch/snapshots/DedicatedClusterSnapshotRestoreIT.java

@@ -1132,11 +1132,11 @@ public void testSnapshotTotalAndIncrementalSizes() throws IOException {

        SnapshotStats stats = snapshots.get(0).getStats();

-        assertThat(stats.getTotalFileCount(), is(snapshot0FileCount));
-        assertThat(stats.getTotalSize(), is(snapshot0FileSize));
+        assertThat(stats.getTotalFileCount(), greaterThanOrEqualTo(snapshot0FileCount));


This simply doesn't match up now and I figured what really matters in these tests is the consistency of the numbers. I didn't want to adjust the results for the file counts to filter out the files that weren't physically uploaded since they are still uploaded as part of the metadata technically.
We do have a bunch of other tests that verify all the files are there and incrementality works as expected so I figured this adjustment is good enough.

original-brownbear · 2020-01-31T13:47:19Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-    private static final String DATA_BLOB_PREFIX = "__";
+    private static final String UPLOADED_DATA_BLOB_PREFIX = "__";
+
+    private static final String VIRTUAL_DATA_BLOB_PREFIX = "v__";


Still need to have virtual blob here so that the logic/assertions in BlobStoreIndexShardSnapshot(s) works without format change and things stay BwC. Technically we could do without this kind of "ID", but then we'd have to go through the whole dance of only updating the format and not uploading the blob once all the snapshots are newer than version X. If we do it this way, we get the benefit of faster restores and snapshots right away since the meta hash works the same way in 6.x already hence even in 7.7 all possible restores should work out.

…less-uploads

ywelsch

I'm wondering why we have that hash at all (which blows up the size of the snap-*.dat)? Why is the checksum not sufficient?

original-brownbear · 2020-01-31T14:42:05Z

I'm wondering why we have that hash at all (which blows up the size of the snap-*.dat)? Why is the checksum not sufficient?

My thoughts exactly as well. But now that we have it, I'd actually rather keep it than remove the hash field. This potentially gives us a nice in to an efficient fix for #50231 because we get the SegmentInfos for cheap from a single blob now.
Other than that, the hash is super useless because we only have it for the segment_N and .si blobs ... for everything else it's just an empty BytesRef.

ywelsch · 2020-01-31T14:54:15Z

I'm mostly concerned that it blows up the shard index file quite a bit (up to 1MB per snapshot), and that it puts a stricter limitation on the maximum number of snapshots that we can have. Can we leave the hash out of that file?

original-brownbear · 2020-01-31T16:18:38Z

I'm mostly concerned that it blows up the shard index file quite a bit (up to 1MB per snapshot), and that it puts a stricter limitation on the maximum number of snapshots that we can have.

True, but so far this has not been an issue for anyone to my knowledge (EDIT: I've also confirmed this with Paul just now, he's never seen any error that would suggest this is a problem either). In the end, these blobs are in almost all cases <0.5kb and thus somewhat close to the size of other data we keep per snapshot anyway.

Can we leave the hash out of that file?

I'm wondering if that's a good move strategically. We could nicely use this data to speed up restores, searchable snapshots and have an efficient way of tackling #50231 at no extra cost and without BwC-annoying change to the format of the shard level metadata. Why not take the free win instead of fixing an issue no one seems to experience?

Generally, I'd then rather make a BwC breaking change to the meta if it has some real win to it, given how involved it tends to be and we have #45736 on the roadmap anyway which will require an adjustment to the shard level metadata.

ywelsch · 2020-02-03T08:55:30Z

Note that I was wondering whether we could leave the hash out of the index file (which is used to enumerate all files, used for incrementality), and only have it in the snap file (which is the one used when restoring a snapshot).

original-brownbear · 2020-02-03T09:56:24Z

Note that I was wondering whether we could leave the hash out of the index file (which is used to enumerate all files, used for incrementality), and only have it in the snap file (which is the one used when restoring a snapshot).

Yea that's a fair point. That's something we could do, though it would mess with my plan for #50231 because it would take away the single spot we currently have for all the segment infos :D
Maybe we could do that in a separate PR after tomorrow and first discuss the possibility of using this field to fix #50231 tomorrow? :)

original-brownbear · 2020-02-04T10:57:59Z

@ywelsch given the discussion just now on incrementality I take it we keep the segments data in the index-${UUID} blobs as well and can keep things the way they are here? :)

ywelsch

LGTM

ywelsch · 2020-02-07T14:00:40Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-    private static final String DATA_BLOB_PREFIX = "__";
+    private static final String UPLOADED_DATA_BLOB_PREFIX = "__";
+
+    private static final String VIRTUAL_DATA_BLOB_PREFIX = "v__";


can you add some Javadocs here explaining what virtual data blobs are?

ywelsch · 2020-02-07T14:02:32Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -1516,6 +1519,9 @@ public void snapshotShard(Store store, MapperService mapperService, SnapshotId s
                    }
                }

+                // We can skip writing blobs where the metadata hash is equal to the blob's contents because we store the hash/contents
+                // directly in the shard level metadata in this case
+                final boolean needsWrite = md.hash().length != md.length();


can you assert here that the content is indeed the same? EDIT: happens below. This might be too subtle otherwise. Perhaps also add a method to StoreFileMetaData that marks these files specially, so that it's more obvious to identify the files.

Done in 8794541 ... actually found a way to make this even safer via the checksum test just in case some will use this method elsewhere or something changes in the future. Maybe take another look before I merge?

…less-uploads

ywelsch

LGTM

original-brownbear · 2020-02-10T14:00:56Z

Thanks Yannick!

Segment(s) info blobs are already stored with their full content in the "hash" field in the shard snapshot metadata as long as they are smaller than 1MB. We can make use of this fact and never upload them physically to the repo. This saves a non-trivial number of uploads and downloads when restoring and might also lower the latency of searchable snapshots since they can save phyiscally loading this information as well.

This commit adapts searchable snapshots to the latest changes from master. The REST handlers declare their routes differently since #51950 and SearchableSnapshotIndexInput need to account for segment infos files that are stored in shard snapshot metadata hash and not uploaded to the blob store anymore since #51729.

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.7.0 labels Jan 31, 2020

assert contents are equal

6d557a3

original-brownbear commented Jan 31, 2020

View reviewed changes

original-brownbear added 2 commits January 31, 2020 15:08

Merge remote-tracking branch 'elastic/master' into optimize-away-need…

f721efb

…less-uploads

offset

c12ed69

ywelsch reviewed Jan 31, 2020

View reviewed changes

original-brownbear requested a review from ywelsch January 31, 2020 16:18

ywelsch approved these changes Feb 7, 2020

View reviewed changes

original-brownbear added 2 commits February 7, 2020 15:41

Merge remote-tracking branch 'elastic/master' into optimize-away-need…

61a468e

…less-uploads

CR: Javadoc + method on store file metadata

8794541

original-brownbear requested a review from ywelsch February 7, 2020 17:57

ywelsch approved these changes Feb 10, 2020

View reviewed changes

original-brownbear merged commit eb56c27 into elastic:master Feb 10, 2020

original-brownbear deleted the optimize-away-needless-uploads branch February 10, 2020 14:01

original-brownbear mentioned this pull request Feb 10, 2020

Don't Upload Redundant Shard Files (#51729) #52147

Merged

tlrx mentioned this pull request Feb 12, 2020

Adapt searchable snapshots code to latest master changes #52258

Merged

original-brownbear mentioned this pull request Jul 28, 2020

Elastic Snapshots cause out of memory exceptions #60173

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't Upload Redundant Shard Files #51729

Don't Upload Redundant Shard Files #51729

original-brownbear commented Jan 31, 2020

elasticmachine commented Jan 31, 2020

original-brownbear Jan 31, 2020

original-brownbear Jan 31, 2020

original-brownbear Jan 31, 2020

ywelsch left a comment

original-brownbear commented Jan 31, 2020

ywelsch commented Jan 31, 2020

original-brownbear commented Jan 31, 2020 •

edited

Loading

ywelsch commented Feb 3, 2020

original-brownbear commented Feb 3, 2020

original-brownbear commented Feb 4, 2020

ywelsch left a comment

ywelsch Feb 7, 2020

ywelsch Feb 7, 2020

original-brownbear Feb 7, 2020 •

edited

Loading

ywelsch left a comment

original-brownbear commented Feb 10, 2020

Don't Upload Redundant Shard Files #51729

Don't Upload Redundant Shard Files #51729

Conversation

original-brownbear commented Jan 31, 2020

elasticmachine commented Jan 31, 2020

original-brownbear Jan 31, 2020

Choose a reason for hiding this comment

original-brownbear Jan 31, 2020

Choose a reason for hiding this comment

original-brownbear Jan 31, 2020

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented Jan 31, 2020

ywelsch commented Jan 31, 2020

original-brownbear commented Jan 31, 2020 • edited Loading

ywelsch commented Feb 3, 2020

original-brownbear commented Feb 3, 2020

original-brownbear commented Feb 4, 2020

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Feb 7, 2020

Choose a reason for hiding this comment

ywelsch Feb 7, 2020

Choose a reason for hiding this comment

original-brownbear Feb 7, 2020 • edited Loading

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented Feb 10, 2020

original-brownbear commented Jan 31, 2020 •

edited

Loading

original-brownbear Feb 7, 2020 •

edited

Loading