Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also mmap cfs files for hybridfs #38940

Merged

Conversation

danielmitterdorfer
Copy link
Member

With this commit we add the .cfs file extension to the list of file
types that are memory-mapped by hybridfs. .cfs files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668

With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates elastic#36668
@danielmitterdorfer danielmitterdorfer added >enhancement v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.2.0 labels Feb 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@danielmitterdorfer
Copy link
Member Author

danielmitterdorfer commented Feb 15, 2019

I have run several experiments to judge the impact of this change. A baseline benchmark with mmapfs for a workload with external ids on a "small" index with roughly 3GB (on a system with 32GB RAM), we see a median throughput of 68700 docs/s. Without this change, hybridfs results in 56900 docs/s (a reduction of 17%). With this change, we are again on par (as expected).

I also ran an update-heavy workload with 40% id conflicts on a larger index of 75GB on a system that only has 8GB available page cache. The baseline (mmapfs) results in 12000 docs/s median indexing throughput whereas hybridfs with this change results in 21800 docs/s.

@danielmitterdorfer danielmitterdorfer merged commit 2ab88e2 into elastic:master Feb 15, 2019
@danielmitterdorfer danielmitterdorfer deleted the hybrid-fs-with-cfs branch February 15, 2019 12:08
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates elastic#36668
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates elastic#36668
danielmitterdorfer added a commit that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
danielmitterdorfer added a commit that referenced this pull request Feb 15, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
@danielmitterdorfer
Copy link
Member Author

danielmitterdorfer commented Feb 15, 2019

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 15, 2019
* elastic/master:
  Avoid double term construction in DfsPhase (elastic#38716)
  Fix typo in DateRange docs (yyy → yyyy) (elastic#38883)
  Introduced class reuses follow parameter code between ShardFollowTasks (elastic#38910)
  Ensure random timestamps are within search boundary (elastic#38753)
  [CI] Muting  method testFollowIndex in IndexFollowingIT
  Update Lucene snapshot repo for 7.0.0-beta1 (elastic#38946)
  SQL: Doc on syntax (identifiers in particular) (elastic#38662)
  Upgrade to Gradle 5.2.1 (elastic#38880)
  Tie break search shard iterator comparisons on cluster alias (elastic#38853)
  Also mmap cfs files for hybridfs (elastic#38940)
  Build: Fix issue with test status logging (elastic#38799)
  Adapt FullClusterRestartIT on master (elastic#38856)
  Fix testAutoFollowing test to use createLeaderIndex() helper method.
  Migrate muted auto follow rolling upgrade test and unmute this test (elastic#38900)
  ShardBulkAction ignore primary response on primary (elastic#38901)
  Recover peers from translog, ignoring soft deletes (elastic#38904)
  Fix NPE on Stale Index in IndicesService (elastic#38891)
  Smarter CCR concurrent file chunk fetching (elastic#38841)
  Fix intermittent failure in ApiKeyIntegTests (elastic#38627)
  re-enable SmokeTestWatcherWithSecurityIT (elastic#38814)
danielmitterdorfer added a commit that referenced this pull request Mar 19, 2019
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
kovrus added a commit to crate/crate that referenced this pull request Sep 9, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 9, 2019
mergify bot pushed a commit to crate/crate that referenced this pull request Sep 10, 2019
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.7.0 v7.0.0-rc2 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants