Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Documentation for new bloom filter settings #6434

Closed
1 of 4 tasks
mgodwan opened this issue Feb 19, 2024 · 3 comments · Fixed by #6449
Closed
1 of 4 tasks

[DOC] Documentation for new bloom filter settings #6434

mgodwan opened this issue Feb 19, 2024 · 3 comments · Fixed by #6449
Assignees
Labels
3 - Done Issue is done/complete v2.12.0
Milestone

Comments

@mgodwan
Copy link
Member

mgodwan commented Feb 19, 2024

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request and all versions that are affected.

We've added new bloom filter implementation in OpenSearch which optimizes the doc id lookup for indexing(upserts) and search/get use cases.

This is currently enabled through a feature flag for OS 2.12 release opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled

Once the feature flag is set, customers have 2 options for enabling and tuning this for a given index:

  1. index.optimize_doc_id_lookup.fuzzy_set.enabled : Enable the fuzzy set for the doc id lookup optimization. Enabling this improves performance for upsert and search operations utilizing doc id by creating a new data structure (bloom filter) which allows to handle negative cases (i.e. ids being absent in the existing index) through off-heap look-ups into the data structure faster. We've seen performance improvements upto 30% for nyc_taxis update benchmark workloads with this.

  2. index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability: Set the false positive probability for the underlying fuzzy set (i.e. bloom filter). Higher the false positive probability, lower the throughput gains and lower storage/memory overhead .Allowed values are 0.01 <= x <= 0.50

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

opensearch-project/OpenSearch#4489 (comment)

@hdhalter
Copy link
Contributor

Thanks, @mgodwan, will you be submitting the documentation PR for this update?

@hdhalter hdhalter added 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. v2.12.0 and removed untriaged labels Feb 19, 2024
@hdhalter hdhalter added this to the v2.12 milestone Feb 20, 2024
@bbarani
Copy link
Member

bbarani commented Feb 20, 2024

@mgodwan Thanks for opening this issue. Is there a reason for opening this issue so late in 2.12.0 release cycle? Documentation PR's are part of entry criteria for release process and we are almost at a point of validating exit criteria now.

@mgodwan
Copy link
Member Author

mgodwan commented Feb 20, 2024

@bbarani This was a miss from my side. I was under the assumption that this issue was created by me already but as I was scraping through my artifacts, I realized it was missed. This is an experimental feature as part of 2.12 release.

@hdhalter Yes, I've raised the PR #6449
Could you please review?

@hdhalter hdhalter added 2 - In progress Issue/PR: The issue or PR is in progress. and removed 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. labels Feb 20, 2024
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete v2.12.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants