Make remote snapshot (local)block size configurable #14753
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
To perform a search on a remote snapshot we download only the specific blocks of the snapshot needed to complete the search. These blocks have a set 8Mib size and are stored in a local reference counted file cache. The default 8Mib block size has significant disk usage and likely performance implications as there is no mechanism to vary the block size depending on the data we expect to read at runtime.
For example, when lucene opens an index input into a compound file with the intention of only reading the Header, which can be quite small, we will download the entire 8Mib block from our remote snapshot repo.
This is particularly noticeable during snapshot restore, as Lucene downloads various blocks containing metadata for each segment. Lucene keeps this metadata in memory and so the blocks are persistent for the lifetime of the cache and never evicted. By selecting a smaller block size users might drastically reduce the size of their baseline searchable snapshots file cache.
Sample benchmarks.
Related Issues
Feature request issue #14990
Potentially alleviates #11676
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.