You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tell us about your request. Provide a summary of the request and all versions that are affected.
We've added new bloom filter implementation in OpenSearch which optimizes the doc id lookup for indexing(upserts) and search/get use cases.
This is currently enabled through a feature flag for OS 2.12 release opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled
Once the feature flag is set, customers have 2 options for enabling and tuning this for a given index:
index.optimize_doc_id_lookup.fuzzy_set.enabled : Enable the fuzzy set for the doc id lookup optimization. Enabling this improves performance for upsert and search operations utilizing doc id by creating a new data structure (bloom filter) which allows to handle negative cases (i.e. ids being absent in the existing index) through off-heap look-ups into the data structure faster. We've seen performance improvements upto 30% for nyc_taxis update benchmark workloads with this.
index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability: Set the false positive probability for the underlying fuzzy set (i.e. bloom filter). Higher the false positive probability, lower the throughput gains and lower storage/memory overhead .Allowed values are 0.01 <= x <= 0.50
What other resources are available? Provide links to related issues, POCs, steps for testing, etc.
@mgodwan Thanks for opening this issue. Is there a reason for opening this issue so late in 2.12.0 release cycle? Documentation PR's are part of entry criteria for release process and we are almost at a point of validating exit criteria now.
@bbarani This was a miss from my side. I was under the assumption that this issue was created by me already but as I was scraping through my artifacts, I realized it was missed. This is an experimental feature as part of 2.12 release.
@hdhalter Yes, I've raised the PR #6449
Could you please review?
What do you want to do?
Tell us about your request. Provide a summary of the request and all versions that are affected.
We've added new bloom filter implementation in OpenSearch which optimizes the doc id lookup for indexing(upserts) and search/get use cases.
This is currently enabled through a feature flag for OS 2.12 release
opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled
Once the feature flag is set, customers have 2 options for enabling and tuning this for a given index:
index.optimize_doc_id_lookup.fuzzy_set.enabled
: Enable the fuzzy set for the doc id lookup optimization. Enabling this improves performance for upsert and search operations utilizing doc id by creating a new data structure (bloom filter) which allows to handle negative cases (i.e. ids being absent in the existing index) through off-heap look-ups into the data structure faster. We've seen performance improvements upto 30% fornyc_taxis
update benchmark workloads with this.index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability
: Set the false positive probability for the underlying fuzzy set (i.e. bloom filter). Higher the false positive probability, lower the throughput gains and lower storage/memory overhead .Allowed values are 0.01 <= x <= 0.50What other resources are available? Provide links to related issues, POCs, steps for testing, etc.
opensearch-project/OpenSearch#4489 (comment)
The text was updated successfully, but these errors were encountered: