Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Upload Redundant Shard Files #51729

Conversation

original-brownbear
Copy link
Member

Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
physically loading this information as well.

Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
phyiscally loading this information as well.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@@ -1035,7 +1035,8 @@ public void testUnrestorableFilesDuringRestore() throws Exception {
final String indexName = "unrestorable-files";
final int maxRetries = randomIntBetween(1, 10);

Settings createIndexSettings = Settings.builder().put(SETTING_ALLOCATION_MAX_RETRY.getKey(), maxRetries).build();
Settings createIndexSettings = Settings.builder().put(SETTING_ALLOCATION_MAX_RETRY.getKey(), maxRetries)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little annoying but if we run this test with more than a single shard, then it might be that we get shards that aren't corrupted because they contain no documents now as well as no physical data files because only the .si and segments_N are uploaded for the empty shards.

@@ -1132,11 +1132,11 @@ public void testSnapshotTotalAndIncrementalSizes() throws IOException {

SnapshotStats stats = snapshots.get(0).getStats();

assertThat(stats.getTotalFileCount(), is(snapshot0FileCount));
assertThat(stats.getTotalSize(), is(snapshot0FileSize));
assertThat(stats.getTotalFileCount(), greaterThanOrEqualTo(snapshot0FileCount));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simply doesn't match up now and I figured what really matters in these tests is the consistency of the numbers. I didn't want to adjust the results for the file counts to filter out the files that weren't physically uploaded since they are still uploaded as part of the metadata technically.
We do have a bunch of other tests that verify all the files are there and incrementality works as expected so I figured this adjustment is good enough.

private static final String DATA_BLOB_PREFIX = "__";
private static final String UPLOADED_DATA_BLOB_PREFIX = "__";

private static final String VIRTUAL_DATA_BLOB_PREFIX = "v__";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to have virtual blob here so that the logic/assertions in BlobStoreIndexShardSnapshot(s) works without format change and things stay BwC. Technically we could do without this kind of "ID", but then we'd have to go through the whole dance of only updating the format and not uploading the blob once all the snapshots are newer than version X. If we do it this way, we get the benefit of faster restores and snapshots right away since the meta hash works the same way in 6.x already hence even in 7.7 all possible restores should work out.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why we have that hash at all (which blows up the size of the snap-*.dat)? Why is the checksum not sufficient?

@original-brownbear
Copy link
Member Author

I'm wondering why we have that hash at all (which blows up the size of the snap-*.dat)? Why is the checksum not sufficient?

My thoughts exactly as well. But now that we have it, I'd actually rather keep it than remove the hash field. This potentially gives us a nice in to an efficient fix for #50231 because we get the SegmentInfos for cheap from a single blob now.
Other than that, the hash is super useless because we only have it for the segment_N and .si blobs ... for everything else it's just an empty BytesRef.

@ywelsch
Copy link
Contributor

ywelsch commented Jan 31, 2020

I'm mostly concerned that it blows up the shard index file quite a bit (up to 1MB per snapshot), and that it puts a stricter limitation on the maximum number of snapshots that we can have. Can we leave the hash out of that file?

@original-brownbear
Copy link
Member Author

original-brownbear commented Jan 31, 2020

I'm mostly concerned that it blows up the shard index file quite a bit (up to 1MB per snapshot), and that it puts a stricter limitation on the maximum number of snapshots that we can have.

True, but so far this has not been an issue for anyone to my knowledge (EDIT: I've also confirmed this with Paul just now, he's never seen any error that would suggest this is a problem either). In the end, these blobs are in almost all cases <0.5kb and thus somewhat close to the size of other data we keep per snapshot anyway.

Can we leave the hash out of that file?

I'm wondering if that's a good move strategically. We could nicely use this data to speed up restores, searchable snapshots and have an efficient way of tackling #50231 at no extra cost and without BwC-annoying change to the format of the shard level metadata. Why not take the free win instead of fixing an issue no one seems to experience?

Generally, I'd then rather make a BwC breaking change to the meta if it has some real win to it, given how involved it tends to be and we have #45736 on the roadmap anyway which will require an adjustment to the shard level metadata.

@ywelsch
Copy link
Contributor

ywelsch commented Feb 3, 2020

Note that I was wondering whether we could leave the hash out of the index file (which is used to enumerate all files, used for incrementality), and only have it in the snap file (which is the one used when restoring a snapshot).

@original-brownbear
Copy link
Member Author

Note that I was wondering whether we could leave the hash out of the index file (which is used to enumerate all files, used for incrementality), and only have it in the snap file (which is the one used when restoring a snapshot).

Yea that's a fair point. That's something we could do, though it would mess with my plan for #50231 because it would take away the single spot we currently have for all the segment infos :D
Maybe we could do that in a separate PR after tomorrow and first discuss the possibility of using this field to fix #50231 tomorrow? :)

@original-brownbear
Copy link
Member Author

@ywelsch given the discussion just now on incrementality I take it we keep the segments data in the index-${UUID} blobs as well and can keep things the way they are here? :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

private static final String DATA_BLOB_PREFIX = "__";
private static final String UPLOADED_DATA_BLOB_PREFIX = "__";

private static final String VIRTUAL_DATA_BLOB_PREFIX = "v__";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some Javadocs here explaining what virtual data blobs are?

@@ -1516,6 +1519,9 @@ public void snapshotShard(Store store, MapperService mapperService, SnapshotId s
}
}

// We can skip writing blobs where the metadata hash is equal to the blob's contents because we store the hash/contents
// directly in the shard level metadata in this case
final boolean needsWrite = md.hash().length != md.length();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you assert here that the content is indeed the same? EDIT: happens below. This might be too subtle otherwise. Perhaps also add a method to StoreFileMetaData that marks these files specially, so that it's more obvious to identify the files.

Copy link
Member Author

@original-brownbear original-brownbear Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8794541 ... actually found a way to make this even safer via the checksum test just in case some will use this method elsewhere or something changes in the future. Maybe take another look before I merge?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks Yannick!

@original-brownbear original-brownbear merged commit eb56c27 into elastic:master Feb 10, 2020
@original-brownbear original-brownbear deleted the optimize-away-needless-uploads branch February 10, 2020 14:01
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Feb 10, 2020
Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
phyiscally loading this information as well.
original-brownbear added a commit that referenced this pull request Feb 10, 2020
Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
phyiscally loading this information as well.
tlrx added a commit that referenced this pull request Feb 12, 2020
This commit adapts searchable snapshots to the latest changes from master.

The REST handlers declare their routes differently since #51950 and 
SearchableSnapshotIndexInput need to account for segment infos files that 
are stored in shard snapshot metadata hash and not uploaded to the blob 
store anymore since #51729.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants