Use mmapfs as default store type #38157

danielmitterdorfer · 2019-02-01T07:55:39Z

With this commit we switch the default store type from hybridfs to mmapfs.
While hybridfs is beneficial for random access workloads (think: updates and
queries) when the index size is much larger than the available page cache, it
incurs a performance penalty on smaller indices that fit into the page cache (or
are not much larger than that).

This performance penalty shows not only for bulk updates or queries but also for
bulk indexing (without any conflicts) when an external document id is provided
by the client. For example, in the geonames benchmark this results in a
throughput reduction of roughly 17% compared to mmapfs. This reduction is
caused by document id lookups that show up as the top contributor in the profile
when enabling hybridfs. Below is such an example stack trace as captured by
async-profiler during a benchmarking trial where we can see that the overhead is
caused by additional read system calls for document id lookups:

__GI_pread64
sun.nio.ch.FileDispatcherImpl.pread0
sun.nio.ch.FileDispatcherImpl.pread
sun.nio.ch.IOUtil.readIntoNativeBuffer
sun.nio.ch.IOUtil.read sun.nio.ch.FileChannelImpl.readInternal
sun.nio.ch.FileChannelImpl.read
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal
org.apache.lucene.store.BufferedIndexInput.refill
org.apache.lucene.store.BufferedIndexInput.readByte
org.apache.lucene.store.DataInput.readVInt
org.apache.lucene.store.BufferedIndexInput.readVInt
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact
org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.getDocID
org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.
    lookupVersion
org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.loadDocIdAndVersion
org.elasticsearch.index.engine.InternalEngine.resolveDocVersion
org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary
org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation
org.elasticsearch.index.engine.InternalEngine.index
org.elasticsearch.index.shard.IndexShard.index
org.elasticsearch.index.shard.IndexShard.applyIndexOperation
org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary
[...]

For these reasons we are restoring mmapfs as the default store type.

Relates #36668

With this commit we switch the default store type from `hybridfs` to `mmapfs`. While `hybridfs` is beneficial for random access workloads (think: updates and queries) when the index size is much larger than the available page cache, it incurs a performance penalty on smaller indices that fit into the page cache (or are not much larger than that). This performance penalty shows not only for bulk updates or queries but also for bulk indexing (without *any* conflicts) when an external document id is provided by the client. For example, in the `geonames` benchmark this results in a throughput reduction of roughly 17% compared to `mmapfs`. This reduction is caused by document id lookups that show up as the top contributor in the profile when enabling `hybridfs`. Below is such an example stack trace as captured by async-profiler during a benchmarking trial where we can see that the overhead is caused by additional `read` system calls for document id lookups: ``` __GI_pread64 sun.nio.ch.FileDispatcherImpl.pread0 sun.nio.ch.FileDispatcherImpl.pread sun.nio.ch.IOUtil.readIntoNativeBuffer sun.nio.ch.IOUtil.read sun.nio.ch.FileChannelImpl.readInternal sun.nio.ch.FileChannelImpl.read org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal org.apache.lucene.store.BufferedIndexInput.refill org.apache.lucene.store.BufferedIndexInput.readByte org.apache.lucene.store.DataInput.readVInt org.apache.lucene.store.BufferedIndexInput.readVInt org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.getDocID org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup. lookupVersion org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.loadDocIdAndVersion org.elasticsearch.index.engine.InternalEngine.resolveDocVersion org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation org.elasticsearch.index.engine.InternalEngine.index org.elasticsearch.index.shard.IndexShard.index org.elasticsearch.index.shard.IndexShard.applyIndexOperation org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary [...] ``` For these reasons we are restoring `mmapfs` as the default store type. Relates elastic#36668

elasticmachine · 2019-02-01T07:55:41Z

Pinging @elastic/es-distributed

jpountz · 2019-02-01T08:03:30Z

I am confused why NIOFSDirectory appears in the stack since the terms dict is supposed to open with mmap?

danielmitterdorfer · 2019-02-01T13:27:02Z

After further investigation it turns out that this is due the compound format of Lucene (.cfs files). These files are written by Lucene in order to save file handles and combine multiple files into one and this approach is used on segments that are less than 10% of the index size. As hybridfs does not have a special handling for this file type, it gets read via NIO instead of memory-mapping it.

We could add .cfs to the list of files that hybridfs memory-maps instead of reading them via NIO. While this would resolve the performance impact for small indices, it would neuter the positive effect of hybridfs on larger indices because then we'd see page cache thrashing again and avoiding this effect is the whole point of hybridfs. As an additional measure we can disallow the compound format on larger segments (for some definition of "large"). This would mean that:

Smaller segments use the compound format (.cfs). These files get memory-mapped and thus do not incur the performance penalty that we see in the profile above.
Larger segments do not use the compound format anymore (but Elasticsearch would use more file handles). This means that we see individual files (e.g. .tim, .tip, ...) on the file system and we read them according to their expected data access pattern either via NIO or memory-mapping.

We expect that this approach would provide good performance for small and large indices but do not have experimental evidence to back up this hypothesis. Also, there might be other side effects (apart from the increased number of file handles) that we need to consider first.

As we first need to decide on the way forward, I have marked this PR as WIP effectively putting it on hold for now.

danielmitterdorfer · 2019-02-15T10:22:38Z

I have run further experiments by now. Adding .cfs to the list of files to memory-map improves performance for smaller and larger indices and thus I am going to abandon this PR and instead open a follow-up where we add .cfs.

danielmitterdorfer · 2019-02-15T10:39:02Z

I have opened #38940 instead where I also present benchmark results.

danielmitterdorfer added >enhancement v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Feb 1, 2019

danielmitterdorfer requested a review from jpountz February 1, 2019 07:55

danielmitterdorfer added the WIP label Feb 1, 2019

jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

danielmitterdorfer closed this Feb 15, 2019

danielmitterdorfer removed the v8.0.0 label Feb 15, 2019

jainankitk mentioned this pull request Jun 27, 2023

Support for disabling terms index off heap opensearch-project/OpenSearch#825

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use mmapfs as default store type #38157

Use mmapfs as default store type #38157

danielmitterdorfer commented Feb 1, 2019

elasticmachine commented Feb 1, 2019

jpountz commented Feb 1, 2019

danielmitterdorfer commented Feb 1, 2019 •

edited

Loading

danielmitterdorfer commented Feb 15, 2019

danielmitterdorfer commented Feb 15, 2019

Use mmapfs as default store type #38157

Use mmapfs as default store type #38157

Conversation

danielmitterdorfer commented Feb 1, 2019

elasticmachine commented Feb 1, 2019

jpountz commented Feb 1, 2019

danielmitterdorfer commented Feb 1, 2019 • edited Loading

danielmitterdorfer commented Feb 15, 2019

danielmitterdorfer commented Feb 15, 2019

danielmitterdorfer commented Feb 1, 2019 •

edited

Loading