From 6fe178fe7828a12fc951a73cf9014566c798ddcb Mon Sep 17 00:00:00 2001 From: mgodwan Date: Tue, 20 Feb 2024 23:59:44 +0530 Subject: [PATCH 1/5] Add documentation for new bloom filter settings Signed-off-by: mgodwan --- .../configuring-opensearch/index-settings.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index f88d060228..6738c782af 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -182,6 +182,10 @@ OpenSearch supports the following dynamic index-level index settings: - `index.final_pipeline` (String): The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. +- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether fuzzy set should be enabled for optimizing document id lookups in indexing/search calls by using an additional data structure (Bloom Filter). Enabling this improves performance for upsert and search operations which rely on document id by creating a new data structure (bloom filter) which allows to handle negative cases (i.e. ids being absent in the existing index) through faster off-heap look-ups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. + +- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false positive probability for the underlying fuzzy set (i.e. bloom filter). A lower false positive probability ensures higher throughput improvement for upsert/get operations. Allowed values are in the range `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. + ### Updating a dynamic index setting You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request: From 4c4f0b4cd3de51682d1d7f622cc124b16231905e Mon Sep 17 00:00:00 2001 From: mgodwan Date: Wed, 21 Feb 2024 00:50:00 +0530 Subject: [PATCH 2/5] Address PR comments Signed-off-by: mgodwan --- .../configuring-opensearch/index-settings.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index 6738c782af..cf2a500bbf 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -182,9 +182,9 @@ OpenSearch supports the following dynamic index-level index settings: - `index.final_pipeline` (String): The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. -- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether fuzzy set should be enabled for optimizing document id lookups in indexing/search calls by using an additional data structure (Bloom Filter). Enabling this improves performance for upsert and search operations which rely on document id by creating a new data structure (bloom filter) which allows to handle negative cases (i.e. ids being absent in the existing index) through faster off-heap look-ups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. +- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether `fuzzy_set` should be enabled for optimizing document ID lookups in indexing or searching calls by using an additional data structure, in this case, the Bloom filter data structure. Enabling this setting improves performance for upsert and search operations that rely on document ID by creating a new data structure (Bloom filter). The Bloom filter allows for the handling of negative cases (that is, IDs being absent in the existing index) through faster off-heap look-ups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. -- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false positive probability for the underlying fuzzy set (i.e. bloom filter). A lower false positive probability ensures higher throughput improvement for upsert/get operations. Allowed values are in the range `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. +- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false-positive probability for the underlying `fuzzy_set` (that is, the Bloom filter). A lower false-positive probability ensures higher throughput improvement for `UPSERT` or `GET` operations. Allowed values range between `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. ### Updating a dynamic index setting From b9294237c2c338f5bd1ce46fb04d7802c76265fc Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 20 Feb 2024 13:51:56 -0700 Subject: [PATCH 3/5] Update _install-and-configure/configuring-opensearch/index-settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _install-and-configure/configuring-opensearch/index-settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index cf2a500bbf..228d7a0764 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -182,7 +182,7 @@ OpenSearch supports the following dynamic index-level index settings: - `index.final_pipeline` (String): The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. -- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether `fuzzy_set` should be enabled for optimizing document ID lookups in indexing or searching calls by using an additional data structure, in this case, the Bloom filter data structure. Enabling this setting improves performance for upsert and search operations that rely on document ID by creating a new data structure (Bloom filter). The Bloom filter allows for the handling of negative cases (that is, IDs being absent in the existing index) through faster off-heap look-ups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. +- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether `fuzzy_set` should be enabled for optimizing document ID lookups in indexing or searching calls by using an additional data structure, in this case, the Bloom filter data structure. Enabling this setting improves performance for upsert and search operations that rely on document ID by creating a new data structure (Bloom filter). The Bloom filter allows for the handling of negative cases (that is, IDs being absent in the existing index) through faster off-heap lookups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. - `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false-positive probability for the underlying `fuzzy_set` (that is, the Bloom filter). A lower false-positive probability ensures higher throughput improvement for `UPSERT` or `GET` operations. Allowed values range between `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. From 1d545d8842e81f477e1fb1841260b06847899d1d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 20 Feb 2024 13:54:52 -0700 Subject: [PATCH 4/5] Update _install-and-configure/configuring-opensearch/index-settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _install-and-configure/configuring-opensearch/index-settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index 228d7a0764..c36e1fd491 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -184,7 +184,7 @@ OpenSearch supports the following dynamic index-level index settings: - `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether `fuzzy_set` should be enabled for optimizing document ID lookups in indexing or searching calls by using an additional data structure, in this case, the Bloom filter data structure. Enabling this setting improves performance for upsert and search operations that rely on document ID by creating a new data structure (Bloom filter). The Bloom filter allows for the handling of negative cases (that is, IDs being absent in the existing index) through faster off-heap lookups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. -- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false-positive probability for the underlying `fuzzy_set` (that is, the Bloom filter). A lower false-positive probability ensures higher throughput improvement for `UPSERT` or `GET` operations. Allowed values range between `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. +- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false-positive probability for the underlying `fuzzy_set` (that is, the Bloom filter). A lower false-positive probability ensures higher throughput for `UPSERT` and `GET` operations. Allowed values range between `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. ### Updating a dynamic index setting From 11051890ab009465dfed68a91a901edfe20d9db4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 20 Feb 2024 13:57:09 -0700 Subject: [PATCH 5/5] Update index-settings.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi --- .../configuring-opensearch/index-settings.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index c36e1fd491..8f37be48ac 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -182,9 +182,9 @@ OpenSearch supports the following dynamic index-level index settings: - `index.final_pipeline` (String): The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. -- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether `fuzzy_set` should be enabled for optimizing document ID lookups in indexing or searching calls by using an additional data structure, in this case, the Bloom filter data structure. Enabling this setting improves performance for upsert and search operations that rely on document ID by creating a new data structure (Bloom filter). The Bloom filter allows for the handling of negative cases (that is, IDs being absent in the existing index) through faster off-heap lookups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. +- `index.optimize_doc_id_lookup.fuzzy_set.enabled` (Boolean): This setting controls whether `fuzzy_set` should be enabled in order to optimize document ID lookups in index or search calls by using an additional data structure, in this case, the Bloom filter data structure. Enabling this setting improves performance for upsert and search operations that rely on document ID by creating a new data structure (Bloom filter). The Bloom filter allows for the handling of negative cases (that is, IDs being absent in the existing index) through faster off-heap lookups. Default is `false`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. -- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Set the false-positive probability for the underlying `fuzzy_set` (that is, the Bloom filter). A lower false-positive probability ensures higher throughput for `UPSERT` and `GET` operations. Allowed values range between `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. +- `index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability` (Double): Sets the false-positive probability for the underlying `fuzzy_set` (that is, the Bloom filter). A lower false-positive probability ensures higher throughput for `UPSERT` and `GET` operations. Allowed values range between `0.01` and `0.50`. Default is `0.20`. This setting can only be used if the feature flag `opensearch.experimental.optimize_doc_id_lookup.fuzzy_set.enabled` is set to `true`. ### Updating a dynamic index setting