-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quicker shared cache file preallocation #79447
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Yannick!
Once tests passed we should use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one question about the error handling on Linux now only.
} | ||
} | ||
} catch (final Exception e) { | ||
logger.warn(new ParameterizedMessage("failed to pre-allocate cache file [{}] using native methods", cacheFile), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we now running into a broken situation on Linux if e.g. we don't have enough disk space available?
The native allocation will fail, but the setLength
will create a sparse file in this case and thus won't fail because of something like a full disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Only Windows can guarantee us that it won't create a sparse file (as it requires passing an extra flag to do so), so the setLength
variant needs to be restricted to Windows.
On the flip side, the current behavior is trappy as well on Linux, as it falls back to writing 0s through the whole file (possibly taking hours doing so).
In that case I prefer a sparse file (and the machine running out of disk later) instead of the machine spending hours to fill the file with 0s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth noting that disk full is covered already by the check in SharedBytes.findCacheSnapshotCacheFilePath
which checks available disk space before even preallocating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline, the fact that we may in corner-cases not preallocate the file in a non-sparse fashion should be an acceptable price to pay for the simplicity and safety of this solution.
=> LGTM
@elasticmachine retest this please |
Reworks preallocation of the shared_cache file, which was very slow on Windows. Instead of manually filling the file with 0's, this new approach uses the RandomAccessFile.setLength() method, which can quickly allocate a file of the given size (tested that this took only seconds to preallocate TB-size file on Windows).
Reworks preallocation of the shared_cache file, which was very slow on Windows. Instead of manually filling the file with 0's, this new approach uses the RandomAccessFile.setLength() method, which can quickly allocate a file of the given size (tested that this took only seconds to preallocate TB-size file on Windows).
Reworks preallocation of the shared_cache file, which was very slow on Windows. Instead of manually filling the file with 0's, this new approach uses the RandomAccessFile.setLength() method, which can quickly allocate a file of the given size (tested that this took only seconds to preallocate TB-size file on Windows).
Reworks preallocation of the shared_cache file, which was very slow on Windows. Instead of manually filling the file with 0's, this new approach uses the RandomAccessFile.setLength() method, which can quickly allocate a file of the given size (tested that this took only seconds to preallocate TB-size file on Windows).
* upstream/master: (24 commits) Implement framework for migrating system indices (elastic#78951) Improve transient settings deprecation message (elastic#79504) Remove getValue and getValues from Field (elastic#79516) Store Template's mappings as bytes for disk serialization (elastic#78746) [ML] Add queue_capacity setting to start deployment API (elastic#79433) [ML] muting rest compat test issue elastic#79518 (elastic#79519) Avoid redundant available indices check (elastic#76540) Re-enable BWC tests TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496) [DOCS] Temporarily remove APM links (elastic#79411) Fix CCSDuelIT for skipped shards (elastic#79490) Add other time accounting in HotThreads (elastic#79392) Add deprecation info API entries for deprecated monitoring settings (elastic#78799) Add note in breaking changes for nameid_format (elastic#77785) Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302) Upgrade lucene version 8b68bf60c98 (elastic#79461) Use Strings#EMPTY_ARRAY (elastic#79452) Quicker shared cache file preallocation (elastic#79447) [ML] Removing some code that's obsolete for 8.0 (elastic#79444) Ensure indexing_data CCR requests are compressed (elastic#79413) ...
Reworks preallocation of the shared_cache file, which was very slow on Windows.
Instead of manually filling the file with 0's, this new approach uses the
RandomAccessFile.setLength()
method, which can quickly allocate a file of the given size (tested that this took only seconds to preallocate TB-size file on Windows).