Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create cache files with CREATE_NEW & SPARSE options #79371

Merged
merged 4 commits into from
Oct 19, 2021

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Oct 18, 2021

Cold and Frozen tiers use a local cache for some Lucene files. This cache is based on files stored on disk, and to limit the required disk space it relies on the operating system's support for sparse files.

When Elasticsearch creates the cache files it uses the following standard options:

StandardOpenOption.READ,
StandardOpenOption.WRITE,
StandardOpenOption.CREATE,
StandardOpenOption.SPARSE

Sadly those options are not enough to enable sparse file for the cache files. Instead it should use the CREATE_NEW option, as it is documented in the Java NIO API:

/**
     * Sparse file. When used with the {@link #CREATE_NEW} option then this
     * option provides a <em>hint</em> that the new file will be sparse. The
     * option is ignored when the file system does not support the creation of
     * sparse files.
     */
    SPARSE,

The consequence today is that sparse file support if not enable for operating systems that do not enable it by default, like NTFS.

Since partially mounted indices ignores disk watermarks and report a size on disk of 0 bytes it means that the cache files can grow and use much more disk space than needed, potentially filling up the disk.

Operating systems that enables sparse file by default, like most Linux distributions, should not be affected.

This pull request adds a created flag to the CacheFile class that indicates if the file is expected to already exist on disk or not. This flag is used to create or open the file with the correct options.

This bug will be documented in a follow up. I also plan to spend some time looking if we can use JNA to add assertions that the shared cache file is effectively sparse on Windows/Linux etc.

@tlrx tlrx added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.16.0 labels Oct 18, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though maybe adjust the naming a little :)

* Indicates if the file should be created when it is open for the first time.
* This is required to pass the right options for sparse file support.
*/
private volatile boolean created;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name this fileExists and just describe it as "true if the physical cache file exists on disk" or so? I find the description and name very confusing tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I pushed 3bbef1e

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* Indicates if the file should be created when it is open for the first time.
* This is required to pass the right options for sparse file support.
*/
private volatile boolean created;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@tlrx tlrx added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Oct 19, 2021
@elasticsearchmachine elasticsearchmachine merged commit a3f057c into elastic:master Oct 19, 2021
@tlrx
Copy link
Member Author

tlrx commented Oct 19, 2021

Thanks Yannick and Armin!

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
* Create cache files with CREATE_NEW & SPARSE options

* assert + doc

* rename
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x
7.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 79371

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
elasticsearchmachine pushed a commit that referenced this pull request Oct 19, 2021
* Create cache files with CREATE_NEW & SPARSE options

* assert + doc

* rename
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
This commit adds the bug elastic#79371 as a known issue in documentation
from 7.12.0 to 7.15.1.
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 19, 2021
* upstream/master:
  Validate tsdb's routing_path (elastic#79384)
  Adjust the BWC version for the return200ForClusterHealthTimeout field (elastic#79436)
  API for adding and removing indices from a data stream (elastic#79279)
  Exposing the ability to log deprecated settings at non-critical level (elastic#79107)
  Convert operator privilege license object to LicensedFeature (elastic#79407)
  Mute SnapshotBasedIndexRecoveryIT testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79456)
  Create cache files with CREATE_NEW & SPARSE options (elastic#79371)
  Revert "[ML] Use a new annotations index for future annotations (elastic#79151)"
  [ML] Use a new annotations index for future annotations (elastic#79151)
  [ML] Removing legacy code from ML/transform auditor (elastic#79434)
  Fix rate agg with custom `_doc_count` (elastic#79346)
  Optimize SLM Policy Queries (elastic#79341)
  Fix execution of exists query within nested queries on field with doc_values disabled (elastic#78841)
  Stricter UpdateSettingsRequest parsing on the REST layer (elastic#79227)
  Do not release snapshot file download permit during recovery retries (elastic#79409)
  Preserve request headers in a mixed version cluster (elastic#79412)
  Adjust versions after elastic#79044 backport to 7.x (elastic#79424)
  Mute BulkByScrollUsesAllScrollDocumentsAfterConflictsIntegTests (elastic#79429)
  Fail on SSPL licensed x-pack sources (elastic#79348)

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/TimeSeriesModeTests.java
tlrx added a commit that referenced this pull request Oct 19, 2021
This commit adds the bug #79371 as a known issue in documentation
from 7.12.0 to 7.15.1.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
This commit adds the bug elastic#79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of elastic#79473
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
This commit adds the bug elastic#79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of elastic#79473
elasticsearchmachine pushed a commit that referenced this pull request Oct 19, 2021
This commit adds the bug #79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of #79473
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
This commit adds the bug elastic#79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of elastic#79473
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 19, 2021
This commit adds the bug elastic#79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of elastic#79473
elasticsearchmachine pushed a commit that referenced this pull request Oct 19, 2021
This commit adds the bug #79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of #79473
elasticsearchmachine pushed a commit that referenced this pull request Oct 19, 2021
This commit adds the bug #79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of #79473
elasticsearchmachine pushed a commit that referenced this pull request Oct 19, 2021
This commit adds the bug #79371 as a known issue in documentation
from 7.12.0 to 7.15.1.

Backport of #79473
tlrx added a commit that referenced this pull request Nov 5, 2021
…n disk for file (#79698)

In #79371 we fixed a bug where cache files were not created 
as sparse files on Windows platforms because the wrong 
options were used when creating the files for the first time. 
This bug got unnoticed as we were lacking a way to retrieve 
the exact number of bytes allocated for a given file on disk.

This commit adds a FileSystemNatives.allocatedSizeInBytes(Path) 
method for that exact purpose (only implemented for Windows 
for now) and a test in CacheFileTests that would fail on 
Windows if the cache file is not sparse.

Relates #79371
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Nov 5, 2021
…n disk for file (elastic#79698)

In elastic#79371 we fixed a bug where cache files were not created 
as sparse files on Windows platforms because the wrong 
options were used when creating the files for the first time. 
This bug got unnoticed as we were lacking a way to retrieve 
the exact number of bytes allocated for a given file on disk.

This commit adds a FileSystemNatives.allocatedSizeInBytes(Path) 
method for that exact purpose (only implemented for Windows 
for now) and a test in CacheFileTests that would fail on 
Windows if the cache file is not sparse.

Relates elastic#79371
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Nov 5, 2021
…n disk for file (elastic#79698)

In elastic#79371 we fixed a bug where cache files were not created 
as sparse files on Windows platforms because the wrong 
options were used when creating the files for the first time. 
This bug got unnoticed as we were lacking a way to retrieve 
the exact number of bytes allocated for a given file on disk.

This commit adds a FileSystemNatives.allocatedSizeInBytes(Path) 
method for that exact purpose (only implemented for Windows 
for now) and a test in CacheFileTests that would fail on 
Windows if the cache file is not sparse.

Relates elastic#79371
elasticsearchmachine pushed a commit that referenced this pull request Nov 5, 2021
…n disk for file (#79698) (#80426)

In #79371 we fixed a bug where cache files were not created 
as sparse files on Windows platforms because the wrong 
options were used when creating the files for the first time. 
This bug got unnoticed as we were lacking a way to retrieve 
the exact number of bytes allocated for a given file on disk.

This commit adds a FileSystemNatives.allocatedSizeInBytes(Path) 
method for that exact purpose (only implemented for Windows 
for now) and a test in CacheFileTests that would fail on 
Windows if the cache file is not sparse.

Relates #79371

Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Nov 5, 2021
…n disk for file (#79698) (#80427)

In #79371 we fixed a bug where cache files were not created 
as sparse files on Windows platforms because the wrong 
options were used when creating the files for the first time. 
This bug got unnoticed as we were lacking a way to retrieve 
the exact number of bytes allocated for a given file on disk.

This commit adds a FileSystemNatives.allocatedSizeInBytes(Path) 
method for that exact purpose (only implemented for Windows 
for now) and a test in CacheFileTests that would fail on 
Windows if the cache file is not sparse.

Relates #79371

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.15.2 v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants