[DOC] Documentation for new bloom filter settings #6434

mgodwan · 2024-02-19T05:17:19Z

What do you want to do?

Request a change to existing documentation
Add new documentation
Report a technical problem with the documentation
Other

Tell us about your request. Provide a summary of the request and all versions that are affected.

We've added new bloom filter implementation in OpenSearch which optimizes the doc id lookup for indexing(upserts) and search/get use cases.

This is currently enabled through a feature flag for OS 2.12 release opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled

Once the feature flag is set, customers have 2 options for enabling and tuning this for a given index:

index.optimize_doc_id_lookup.fuzzy_set.enabled : Enable the fuzzy set for the doc id lookup optimization. Enabling this improves performance for upsert and search operations utilizing doc id by creating a new data structure (bloom filter) which allows to handle negative cases (i.e. ids being absent in the existing index) through off-heap look-ups into the data structure faster. We've seen performance improvements upto 30% for nyc_taxis update benchmark workloads with this.
index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability: Set the false positive probability for the underlying fuzzy set (i.e. bloom filter). Higher the false positive probability, lower the throughput gains and lower storage/memory overhead .Allowed values are 0.01 <= x <= 0.50

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

opensearch-project/OpenSearch#4489 (comment)

The text was updated successfully, but these errors were encountered:

hdhalter · 2024-02-19T22:07:25Z

Thanks, @mgodwan, will you be submitting the documentation PR for this update?

bbarani · 2024-02-20T17:20:00Z

@mgodwan Thanks for opening this issue. Is there a reason for opening this issue so late in 2.12.0 release cycle? Documentation PR's are part of entry criteria for release process and we are almost at a point of validating exit criteria now.

mgodwan · 2024-02-20T18:39:17Z

@bbarani This was a miss from my side. I was under the assumption that this issue was created by me already but as I was scraping through my artifacts, I realized it was missed. This is an experimental feature as part of 2.12 release.

@hdhalter Yes, I've raised the PR #6449
Could you please review?

mgodwan added the untriaged label Feb 19, 2024

hdhalter assigned mgodwan Feb 19, 2024

hdhalter added 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. v2.12.0 and removed untriaged labels Feb 19, 2024

hdhalter added this to the v2.12 milestone Feb 20, 2024

mgodwan mentioned this issue Feb 20, 2024

Add documentation for new bloom filter settings #6449

Merged

1 task

hdhalter added 2 - In progress Issue/PR: The issue or PR is in progress. and removed 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. labels Feb 20, 2024

vagimeli closed this as completed in #6449 Feb 20, 2024

hdhalter added 3 - Done Issue is done/complete and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Documentation for new bloom filter settings #6434

[DOC] Documentation for new bloom filter settings #6434

mgodwan commented Feb 19, 2024 •

edited

Loading

hdhalter commented Feb 19, 2024

bbarani commented Feb 20, 2024 •

edited

Loading

mgodwan commented Feb 20, 2024

[DOC] Documentation for new bloom filter settings #6434

[DOC] Documentation for new bloom filter settings #6434

Comments

mgodwan commented Feb 19, 2024 • edited Loading

hdhalter commented Feb 19, 2024

bbarani commented Feb 20, 2024 • edited Loading

mgodwan commented Feb 20, 2024

mgodwan commented Feb 19, 2024 •

edited

Loading

bbarani commented Feb 20, 2024 •

edited

Loading