-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable BloomFilter for _id of non-datastream indices #88409
Conversation
Hi @dnhatn, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/bloomfilter/BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
…/BloomFilterPostingsFormat.java Co-authored-by: Adrien Grand <[email protected]>
…/BloomFilterPostingsFormat.java Co-authored-by: Adrien Grand <[email protected]>
…/BloomFilterPostingsFormat.java Co-authored-by: Adrien Grand <[email protected]>
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions but the change looks good to me.
if (IdFieldMapper.NAME.equals(field) | ||
&& mapperService.mappingLookup().isDataStreamTimestampFieldEnabled() == false | ||
&& IndexSettings.BLOOM_FILTER_ID_FIELD_ENABLED_SETTING.get(mapperService.getIndexSettings().getSettings())) { | ||
return new ES84BloomFilterPostingsFormat(super.getPostingsFormatForField(field), bigArrays); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Every instance that is not the same ends up in its own file. It doesn't matter in practice here since we're enabling bloom filters on a single field, but it's a better practice to have the bloom filter format as a member on this class that is initialized upon initialization and to return this shared instance here for the id field?
...r/src/main/java/org/elasticsearch/index/codec/bloomfilter/ES84BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/elasticsearch/index/codec/bloomfilter/ES84BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/elasticsearch/index/codec/bloomfilter/ES84BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/elasticsearch/index/codec/bloomfilter/ES84BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/elasticsearch/index/codec/bloomfilter/ES84BloomFilterPostingsFormat.java
Outdated
Show resolved
Hide resolved
.../test/java/org/elasticsearch/index/codec/bloomfilter/ES84BloomFilterPostingsFormatTests.java
Outdated
Show resolved
Hide resolved
…/ES84BloomFilterPostingsFormat.java Co-authored-by: Adrien Grand <[email protected]>
@jpountz Thanks for reviews. |
This PR adds BloomFilter to Elasticsearch and enables it for the
_id
field of non-data stream indices. BloomFilter should speed up the performance of mget and update requests at a small expense of refresh, merge, and storage.Fork Lucene's BloomFilter to support an on-disk format and integrate with CircuitBreaker (i.e., ensure no OOM when building BloomFilter). We will also have a special treatment to skip reading BloomFilter for the archive tier. See: cbb0800.
Enable BloomFilter for
_id
of non-data-stream indices: bc8c0d3#diff-5779b7efbee3f9220189f8bdf03991e6f23cdac6506c5924e31231d8d0b0314aR52If we are okay with this approach, I will split this PR into two smaller PRs.
Benchmark results:
http_logs/update: 30% faster
geonames/append-fast-with-conflicts: 11% faster
http_logs/append-no-conflicts-index-only: 4% faster
geonames/append-no-conflicts-index-only: 4% slower