diff --git a/_automating-configurations/workflow-templates.md b/_automating-configurations/workflow-templates.md
index 1133148c8f..62406ae069 100644
--- a/_automating-configurations/workflow-templates.md
+++ b/_automating-configurations/workflow-templates.md
@@ -138,5 +138,8 @@ The following table lists the supported workflow templates. To use a workflow te
| `multimodal_search_with_bedrock_titan` | Deploys an Amazon Bedrock multimodal model and configures an ingest pipeline with a `text_image_embedding` processor and a k-NN index for [multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/). You must provide your AWS credentials. | `create_connector.credential.access_key`, `create_connector.credential.secret_key`, `create_connector.credential.session_token` |[Defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/multimodal-search-bedrock-titan-defaults.json) |
| `hybrid_search` | Configures [hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/):
- Creates an ingest pipeline, a k-NN index, and a search pipeline with a `normalization_processor`. You must provide the model ID of the text embedding model to be used. | `create_ingest_pipeline.model_id` |[Defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/hybrid-search-defaults.json) |
| `conversational_search_with_llm_deploy` | Deploys a large language model (LLM) (by default, Cohere Chat) and configures a search pipeline with a `retrieval_augmented_generation` processor for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | `create_connector.credential.key` |[Defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/conversational-search-defaults.json) |
+| `semantic_search_with_reindex` | Configures [semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) with a newly deployed Cohere embedding model. The model is configured to reindex a source index into a newly configured k-NN index. You must provide the API key for the Cohere model along with the source index to be reindexed. | `create_connector.credential.key`, `reindex.source_index`|[Defaults](https://github.com/opensearch-project/flow-framework/blob/main/src/main/resources/defaults/semantic-search-with-reindex-defaults.json) |
+| `semantic_search_with_local_model` | Configures [semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) and deploys a pretrained model (`msmarco-distilbert-base-tas-b`). Adds a [`query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) search processor that sets a default model ID for neural queries and creates a linked k-NN index called `my-nlp-index`. You must provide the API key for the Cohere model. | None | [Defaults](https://github.com/opensearch-project/flow-framework/blob/main/src/main/resources/defaults/semantic-search-with-local-model-defaults.json) |
+| `hybrid_search_with_local_model` | Configures [hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/) and deploys a pretrained model (`msmarco-distilbert-base-tas-b`). Creates an ingest pipeline, a k-NN index, and a search pipeline with a `normalization_processor`. | None | [Defaults](https://github.com/opensearch-project/flow-framework/blob/main/src/main/resources/defaults/hybrid-search-with-local-model-defaults.json) |
diff --git a/_dashboards/management/multi-data-sources.md b/_dashboards/management/multi-data-sources.md
index cffc9e02f5..dc3096c251 100644
--- a/_dashboards/management/multi-data-sources.md
+++ b/_dashboards/management/multi-data-sources.md
@@ -7,15 +7,18 @@ redirect_from:
- /dashboards/discover/multi-data-sources/
---
-# Configuring and using multiple data sources
+# Configuring and using multiple data sources in OpenSearch Dashboards
-You can ingest, process, and analyze data from multiple data sources in OpenSearch Dashboards. You configure the data sources in the **Dashboards Management** > **Data sources** app, as shown in the following image.
+You can ingest, process, and analyze data from multiple data sources in OpenSearch Dashboards. You configure the data sources under **Dashboards Management** > **Data sources**. This interface is shown in the following image.
-
+
## Getting started
-The following tutorial guides you through configuring and using multiple data sources.
+The following tutorial guides you through configuring and using multiple data sources in OpenSearch Dashboards.
+
+The following features are not supported when using multiple data sources: timeline visualization types and the `gantt-chart` plugin.
+{: .note}
### Step 1: Modify the YAML file settings
@@ -35,7 +38,7 @@ A data source connection specifies the parameters needed to connect to a data so
To create a new data source connection:
1. From the OpenSearch Dashboards main menu, select **Dashboards Management** > **Data sources** > **Create data source connection**.
-
+
2. Add the required information to each field to configure the **Connection Details** and **Authentication Method**.
- Under **Connection Details**, enter a title and endpoint URL. For this tutorial, use the URL `https://localhost:9200/`. Entering a description is optional.
@@ -51,22 +54,22 @@ To create a new data source connection:
- After you have entered the appropriate details in all of the required fields, the **Test connection** and **Create data source** buttons become active. You can select **Test connection** to confirm that the connection is valid.
-3. Select **Create data source** to save your settings. The connection is created, and the new data source appears in the list on the **Data Sources** main page. The first data source you create is marked as your default.
+3. Select **Create data source** to save your settings. The connection is created, and the new data source appears in the list on the **Data Sources** main page. The first data source you create is marked as your default.
4. Edit or update a data source connection.
- On the **Data Sources** main page, select the connection you want to modify. The **Connection Details** window opens.
- - To mark the selected data source as the default, select the **Set as default** option.
+ - To mark the selected data source as the default, select the **Set as default** option.
- To make changes to **Connection Details**, edit one or both of the **Title** and **Description** fields and select **Save changes** in the lower-right corner of the screen. You can also cancel changes here. To change the **Authentication Method**, choose a different authentication method, enter your credentials (if applicable), and then select **Save changes** in the lower-right corner of the screen. The changes are saved.
-
+
- When **Username & Password** is the selected authentication method, you can update the password by choosing **Update stored password** next to the **Password** field. In the pop-up window, enter a new password in the first field and then enter it again in the second field to confirm. Select **Update stored password** in the pop-up window. The new password is saved. Select **Test connection** to confirm that the connection is valid.
- When **AWS SigV4** is the selected authentication method, you can update the credentials by selecting **Update stored AWS credential**. In the pop-up window, enter a new access key in the first field and a new secret key in the second field. Select **Update stored AWS credential** in the pop-up window. The new credentials are saved. Select **Test connection** in the upper-right corner of the screen to confirm that the connection is valid.
5. Delete the data source connection by selecting the check box to the left of the title and then choosing **Delete 1 connection**. Selecting multiple check boxes for multiple connections is supported. Alternatively, select the {::nomarkdown}{:/} icon.
-An example data source connection screen is shown in the following image.
+A data source connection interface is shown in the following image.
@@ -93,13 +96,15 @@ To select a data source through the Dev Tools console, follow these steps:
5. From the **Data source** dropdown menu, select a data source and then query the source.
6. Repeat the preceding steps for each data source you want to select.
-### Upload saved objects to a dashboard from connected data sources
+---
+
+## Uploading saved objects to a dashboard from connected data sources
To upload saved objects from connected data sources to a dashboard with multiple data sources, export them as an NDJSON file from the data source's **Saved object management** page. Then upload the file to the dashboard's **Saved object management** page. This method can simplify the transfer of saved objects between dashboards. The following 20-second video shows this feature in action.
{: .img-fluid}
-#### Import saved objects from a connected data source
+### Importing saved objects from a connected data source
Follow these steps to import saved objects from a connected data source:
@@ -109,11 +114,13 @@ Follow these steps to import saved objects from a connected data source:
4. Select **Import** > **Select file** and upload the file acquired from the connected data source.
5. Choose the appropriate **Data source** from the dropdown menu, set your **Conflict management** option, and then select the **Import** button.
-### Show or hide authentication methods for multiple data sources
+---
+
+## Showing or hiding authentication methods
Introduced 2.13
{: .label .label-purple }
-A feature flag in your `opensearch_dashboards.yml` file allows you to show or hide authentication methods within the `data_source` plugin. The following example setting, shown in a 10-second demo, hides the authentication method for `AWSSigV4`.
+A feature flag in your `opensearch_dashboards.yml` file allows you to show or hide authentication methods within the `data_source` plugin. The following setting hides the authentication method for `AWSSigV4`.
````
# Set enabled to false to hide the authentication method from multiple data source in OpenSearch Dashboards.
@@ -128,89 +135,212 @@ data_source.authTypes:
enabled: false
````
+The following demo shows this process.
+
{: .img-fluid}
-### Hide the local cluster option for multiple data sources
+## Showing or hiding the local cluster
Introduced 2.13
{: .label .label-purple }
-A feature flag in your `opensearch_dashboards.yml` file allows you to hide the local cluster option within the `data_source` plugin. This option hides the local cluster from the data source dropdown menu and index creation page, which is ideal for environments with or without a local OpenSearch cluster. The following example setting, shown in a 20-second demo, hides the local cluster.
+A feature flag in your `opensearch_dashboards.yml` file allows you to hide the local cluster option within the `data_source` plugin. This option hides the local cluster from the data source dropdown menu and index creation page, which is ideal for environments with or without a local OpenSearch cluster. The following example setting, shown in a 20-second demo, hides the local cluster:
````
-# hide local cluster in the data source dropdown and index pattern creation page.
+# hide local cluster in the data source dropdown and index pattern creation page.
data_source.hideLocalCluster: true
````
+The following demo shows this process.
+
{: .img-fluid}
+---
+
## Using multiple data sources with external dashboards plugins
Introduced 2.14
-{: .label .label-purple }
+{: .label .label-purple}
-The following plugins now support multiple data sources
+The following plugins now support multiple data sources.
### Index management
-When the data source feature is enabled, you can navigate to **Index Management** under the **Management** menu. Using indexes as an example, you can view all connected data sources and select a specific one from the navigation bar on the upper right. By default, the indexes from the designated default data source are displayed. However, you can select any connected data source to view its corresponding indexes. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can view and select data sources and their associated indexes directly from the interface:
+
+1. Navigate to **Management** > **Index Management** under the main menu.
+2. Select **Indexes** from the sidebar menu and then select the {::nomarkdown}{:/} icon on the upper-right menu bar.
+3. Choose the appropriate data source from the dropdown menu and then choose the appropriate index from the list. By default, the indexes from your default data source are displayed. You can choose any connected data source to view its corresponding indexes.
+
+The following GIF illustrates these steps.
-To perform operations on a specific index within a data source, select the individual index from the list. To create a new index, select the **Create Index** button, which opens a form. Fill in the required information and select the **Create** button. The index is created within the selected data source. The following GIF illustrates these steps.
+To perform operations on a specific index within a data source, select the individual index from the list. To create a new index, select the **Create Index** button, which opens a form. Enter the required information and select the **Create** button. The index is created within the selected data source. The following GIF illustrates these steps.
### Anomaly detection
-When the data source feature is enabled, you can navigate to **Anomaly Detection** under the **OpenSearch Plugins** menu. On the navigation bar on the upper right, you can view all connected data sources and select a specific data source to view the dashboard from that source if it has detectors. If the selected data source does not have any detectors, the page prompts you to **Create detector**. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can create or view detectors associated with a data source:
+
+1. Navigate to **OpenSearch Plugins** > **Anomaly Detection** under the main menu.
+2. Select the database icon on the upper-right menu bar to view a list of connected data sources.
+3. Select a data source to view a list of associated detectors. If the selected data source does not have detectors, then the **Create detector** button appears under the upper-right menu bar. See [Creating anomaly detectors]({{site.url}}{{site.baseurl}}/observing-your-data/ad/dashboards-anomaly-detection/#creating-anomaly-detectors) for instructions on creating detectors through the interface.
+
+The following GIF illustrates these steps.
-When you select **Detectors** from the side bar, the page displays the detectors currently configured for the selected data source. You can view and configure individual detectors by selecting them from the list. The following GIF illustrates these steps.
+You can edit the data source's associated detectors on the **Detectors** tab under the left side bar.
+
+1. Select **Detectors** and then select the {::nomarkdown}{:/} icon on the upper-right menu bar.
+2. From the dropdown menu, select the appropriate data source. A list of associated detectors appears.
+3. Choose a detector from the list, select **Actions**, and then choose the appropriate edit option from the dropdown menu.
+4. Enter the applicable settings and configuration details.
+
+The following GIF illustrates these steps.
### Security
-When the data source feature is enabled, you can navigate to **Security** under the **Management** menu. Using role management as an example, you can view all connected data sources in the navigation bar on the upper right and select a specific data source to view its existing roles. To create a new role, select the **Create role** button, which takes you to a new page. Enter the required information and select **Create** to add the new role to the selected data source. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can view and manage roles for each connected data source:
+
+1. Navigate to **Management** > **Security** under the main menu.
+2. Select **Roles** from the left sidebar menu and then select the {::nomarkdown}{:/} icon on the upper-right menu bar.
+3. From the dropdown menu, select the appropriate data source and then select the **Create role** button to add a new role.
+4. Enter the required configuration information and select the **Create** button to save.
+
+The following GIF illustrates these steps.
### Maps
-When the data source feature is enabled, you can navigate to **Maps** under the **OpenSearch Plugins** menu. To edit an existing map, select it from the maps list page, which opens the edit page. On the edit page, you can view all available data sources and the ones currently used in the map. To add a new layer, select **Add layer**, and then select **Documents** from the prompt, which opens a flyout. In the flyout, select the index pattern and geospatial field. Note that the data source name is prefixed to the index pattern name. After selecting **Update**, the new layer is added. Select the {::nomarkdown}{:/} icon to verify that a new data source is now being used in the map. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can view all available data sources, including the ones currently used as layers, in a map:
+
+1. Navigate to **OpenSearch Plugins** > **Maps** under the main menu.
+2. From the dropdown menu, select the appropriate data source to edit or create an associated map layer:
+ - Edit a map layer by selecting one from the **Layers** dropdown menu. In the pop-up window, view the settings and edit them as needed.
+ - Add a new layer by selecting the **Add layer** button from the dropdown menu and then selecting **Documents** in the pop-up window. Another pop-up window appears on the right. Enter the required information on the **Data** tab. Note that the data source name is prefixed to the index pattern name. The **Style** and **Settings** tabs include optional information.
+ - Select **Update** to save the settings.
+3. Select the **Save** button on the menu bar to save the edited or new layer.
+4. Select the {::nomarkdown}{:/} icon on the upper-right menu bar to verify that the new data source is listed in the dropdown menu.
+
+The following GIF illustrates these steps.
### Machine learning
-When the data source feature is enabled, you can navigate to **Machine Learning** under the **OpenSearch Plugins** menu. Initially, the models within the default data source are displayed. To view models from a different data source, switch to that data source from the navigation bar. To inspect the details of a specific model, select the {::nomarkdown}{:/} icon to the right of the model entry. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can view and manage machine learning models from different connected data sources:
+
+1. Navigate to **OpenSearch Plugins** > **Machine Learning** under the main menu.
+2. Select the {::nomarkdown}{:/} icon and choose a data source from the dropdown menu. A list of models associated with the selected data source is displayed.
+3. Select the {::nomarkdown}{:/} icon to the right of a listed model to view the model's configuration details for the selected data source.
+
+The following GIF illustrates these steps.
### Notifications
-When the data source feature is enabled, you can navigate to **Notifications** under the **Management** menu. The page displays the notification channels configured for the currently selected data source. To view channels from a different data source, select the desired data source from the menu. To view or edit the details of an existing channel, select it from the list, which opens the channel details page. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can view and manage notification channels for different data sources:
+
+1. Navigate to **Management** > **Notifications** under the main menu.
+2. Select the {::nomarkdown}{:/} icon and choose a data source from the dropdown menu. A list of channels associated with the selected data source is displayed.
+3. Choose a channel from the list to view or manage its settings.
+ - Edit the channel's settings by selecting the **Actions** button and choosing the **Edit** option. Enter the required information in the **Edit channel** panel and then choose **Save**.
+ - Send a test message to the channel by selecting the **Send test message** button in the **Edit channel** window. Alternatively, you can select the **Actions** button in the channel details window and then choose the **Send test message** option from the dropdown menu.
+
+The following GIF illustrates these steps.
### Search relevance
-When the data source feature is enabled, you can navigate to **Search Relevance** under the **OpenSearch Plugins** menu. On the navigation bar on the upper right, you can view all available data sources. To compare search results between indexes from different data sources, first select a data source and an index for **Query 1**, and then select a data source and an index for **Query 2**. Select **Search** to run the queries. The following GIF illustrates these steps.
+When you set `data_source.enabled:true`, you can compare search results across indexes from different data sources:
+
+1. Navigate to **OpenSearch Plugins** > **Search Relevance** under the main menu.
+2. Select the {::nomarkdown}{:/} icon and choose a data source from the dropdown menu. A list of available data sources is displayed.
+3. Under both **Query 1** and **Query 2**, select a data source and an index.
+4. Select the **Search** button to run the queries. The query results are displayed in their respective results panels.
+
+The following GIF illustrates these steps.
-## Next steps
+### Security analytics
+Introduced 2.15
+{: .label .label-purple}
+
+When you set `data_source.enabled:true`, you can view and manage security analytics resources, such as detection rules, across multiple connected data sources:
+
+1. Navigate to **OpenSearch Plugins** > **Security analytics** under the main menu.
+2. Select the {::nomarkdown}{:/} icon and choose a data source from the dropdown menu.
+3. Select **Dectectors** > **Detection rules** from the navigation menu on the left. A list of detection rules is displayed.
+4. Select a rule to open a pop-up window containing more information about that rule.
+
+The following GIF illustrates these steps.
+
+
+
+1. Navigate to **OpenSearch Plugins** > **Security analytics** under the main menu.
+2. Select the {::nomarkdown}{:/} icon and choose a data source from the dropdown menu.
+3. Select **Dectectors** > **Detection rules** from the navigation menu on the left.
+4. Select the **Create detection rule** button on the upper right and then enter the required configuration details in the **Create detection rule** window.
+5. Select the **Create detection rule** button on the lower right to save the rule. The rule is now associated with the data source.
+
+The following GIF illustrates these steps.
+
+
+
+### Alerting
+Introduced 2.15
+{: .label .label-purple }
+
+When you set `data_source.enabled:true`, you can you can view and manage alerting monitors across multiple connected data sources:
+
+1. Navigate to **OpenSearch Plugins** > **Alerting** under the main menu.
+2. Select the {::nomarkdown}{:/} icon and choose a data source from the dropdown menu. A list of associated monitors is displayed.
+3. Select a monitor to view its details.
-After configuring multiple data sources, you can analyze the data from each source. Refer to the following resources for more information:
+The following GIF illustrates these steps.
-- Learn about [managing index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) through OpenSearch Dashboards.
-- Learn about [indexing data using Index Management]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards.
-- Learn about how to [connect OpenSearch and Amazon S3 through OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/).
-- Learn about the [Integrations tool]({{site.url}}{{site.baseurl}}/integrations/index/), which gives you the flexibility to use various data ingestion methods and connect data from the Dashboards UI.
+
+
+To create a new monitor, select **Create monitor**. Fill out the form and select **Create**. The monitor is created within the selected data source.
+
+#### Managing alerting monitors from within the Dashboards application
+
+To manage data source monitors from within **Dashboards**:
+
+1. Navigate to the **Dashboards** application under the main menu and then select a dashboard from the list.
+2. From the dashboard, select the {::nomarkdown}{:/} icon to open the **Options** dropdown menu and then choose **Alerting**.
+4. From the **Alerting** dropdown menu, choose **Associated monitors** to open the configuration window.
+5. Select a monitor from the list to view or edit its details.
+
+The following GIF illustrates these steps.
+
+
+
+To associate a monitor with a data source:
+
+1. Navigate to the **Dashboards** application under the main menu and then select a dashboard from the list.
+2. From the dashboard, select the {::nomarkdown}{:/} icon to open the **Options** dropdown menu and then choose **Alerting**.
+3. From the **Alerting** dropdown menu, choose **Add alerting monitor** to open the configuration window.
+4. Enter the configuration information and then select the **Create monitor** button. The monitor is now associated with the data source.
+
+The following GIF illustrates these steps.
+
+
+
+---
-## Limitations
+## Next steps
-The following features are not supported when using multiple data sources:
+After configuring multiple data sources, you can analyze the data from each source. See the following resources for more information:
-* Timeline visualization types
-* Some external plugins, such as the `gantt-chart` plugin
+- [Index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/)
+- [Index Management]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/)
+- [Connecting OpenSearch and Amazon S3 through OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/)
+- [OpenSearch Integrations]({{site.url}}{{site.baseurl}}/integrations/index/)
diff --git a/_data-prepper/pipelines/configuration/processors/obfuscate.md b/_data-prepper/pipelines/configuration/processors/obfuscate.md
index 13d906acb3..8d6bf901da 100644
--- a/_data-prepper/pipelines/configuration/processors/obfuscate.md
+++ b/_data-prepper/pipelines/configuration/processors/obfuscate.md
@@ -67,6 +67,7 @@ Use the following configuration options with the `obfuscate` processor.
| `source` | Yes | The source field to obfuscate. |
| `target` | No | The new field in which to store the obfuscated value. This leaves the original source field unchanged. When no `target` is provided, the source field updates with the obfuscated value. |
| `patterns` | No | A list of regex patterns that allow you to obfuscate specific parts of a field. Only parts that match the regex pattern will obfuscate. When not provided, the processor obfuscates the whole field. |
+| `single_word_only` | No | When set to `true`, a word boundary `\b` is added to the pattern, which causes obfuscation to be applied only to words that are standalone in the input text. By default, it is `false`, meaning obfuscation patterns are applied to all occurrences. Can be used for Data Prepper 2.8 or greater.
| `obfuscate_when` | No | Specifies under what condition the Obfuscate processor should perform matching. Default is no condition. |
| `tags_on_match_failure` | No | The tag to add to an event if the obfuscate processor fails to match the pattern. |
| `action` | No | The obfuscation action. As of Data Prepper 2.3, only the `mask` action is supported. |
diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md
index c93f4708d1..b1c32f0005 100644
--- a/_data-prepper/pipelines/configuration/sinks/opensearch.md
+++ b/_data-prepper/pipelines/configuration/sinks/opensearch.md
@@ -65,7 +65,7 @@ Option | Required | Type | Description
`connect_timeout` | No | Integer| The timeout value, in milliseconds, when requesting a connection from the connection manager. A timeout value of `0` is interpreted as an infinite timeout. If this timeout value is negative or not set, the underlying Apache HttpClient will rely on operating system settings to manage connection timeouts.
`insecure` | No | Boolean | Whether or not to verify SSL certificates. If set to `true`, then certificate authority (CA) certificate verification is disabled and insecure HTTP requests are sent instead. Default is `false`.
`proxy` | No | String | The address of the [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is `"<hostname or IP>:<port>"` (for example, `"example.com:8100"`, `"http://example.com:8100"`, `"112.112.112.112:8100"`). The port number cannot be omitted.
-`index` | Conditionally | String | The name of the export index. Only required when the `index_type` is `custom`. The index can be a plain string, such as `my-index-name`, contain [Java date-time patterns](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html), such as `my-index-${yyyy.MM.dd}` or `my-${yyyy-MM-dd-HH}-index`, be formatted using field values, such as `my-index-${/my_field}`, or use [Data Prepper expressions](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `my-index-${getMetadata(\"my_metadata_field\"}`. All formatting options can be combined to provide flexibility when creating static, dynamic, and rolling indexes.
+`index` | Conditionally | String | The name of the export index. Only required when the `index_type` is `custom`. The index can be a plain string, such as `my-index-name`, contain [Java date-time patterns](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html), such as `my-index-%{yyyy.MM.dd}` or `my-%{yyyy-MM-dd-HH}-index`, be formatted using field values, such as `my-index-${/my_field}`, or use [Data Prepper expressions](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `my-index-${getMetadata(\"my_metadata_field\"}`. All formatting options can be combined to provide flexibility when creating static, dynamic, and rolling indexes.
`index_type` | No | String | Tells the sink plugin what type of data it is handling. Valid values are `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, or `management-disabled`. Default is `custom`.
`template_type` | No | String | Defines what type of OpenSearch template to use. Available options are `v1` and `index-template`. The default value is `v1`, which uses the original OpenSearch templates available at the `_template` API endpoints. The `index-template` option uses composable [index templates]({{site.url}}{{site.baseurl}}/opensearch/index-templates/), which are available through the OpenSearch `_index_template` API. Composable index types offer more flexibility than the default and are necessary when an OpenSearch cluster contains existing index templates. Composable templates are available for all versions of OpenSearch and some later versions of Elasticsearch. When `distribution_version` is set to `es6`, Data Prepper enforces the `template_type` as `v1`.
`template_file` | No | String | The path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file, such as `/your/local/template-file.json`, when `index_type` is set to `custom`. For an example template file, see [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json). If you supply a template file, then it must match the template format specified by the `template_type` parameter.
diff --git a/_field-types/supported-field-types/derived.md b/_field-types/supported-field-types/derived.md
new file mode 100644
index 0000000000..b937ccac74
--- /dev/null
+++ b/_field-types/supported-field-types/derived.md
@@ -0,0 +1,903 @@
+---
+layout: default
+title: Derived
+nav_order: 62
+has_children: false
+parent: Supported field types
+---
+
+# Derived field type
+**Introduced 2.14**
+{: .label .label-purple }
+
+Derived fields allow you to create new fields dynamically by executing scripts on existing fields. The existing fields can be either retrieved from the `_source` field, which contains the original document, or from a field's doc values. Once you define a derived field either in an index mapping or within a search request, you can use the field in a query in the same way you would use a regular field.
+
+## When to use derived fields
+
+Derived fields offer flexibility in field manipulation and prioritize storage efficiency. However,
+because they are computed at query time, they can reduce query performance. Derived fields are particularly useful in scenarios requiring real-time data transformation, such as:
+
+- **Log analysis**: Extracting timestamps and log levels from log messages.
+- **Performance metrics**: Calculating response times from start and end timestamps.
+- **Security analytics**: Real-time IP geolocation and user-agent parsing for threat detection.
+- **Experimental use cases**: Testing new data transformations, creating temporary fields for A/B testing, or generating one-time reports without altering mappings or reindexing data.
+
+Despite the potential performance impact of query-time computations, the flexibility and storage efficiency of derived fields make them a valuable tool for these applications.
+
+## Current limitations
+
+Currently, derived fields have the following limitations:
+
+- **Aggregation, scoring, and sorting**: Not yet supported.
+- **Dashboard support**: These fields are not displayed in the list of available fields in OpenSearch Dashboards. However, you can still use them for filtering if you know the derived field name.
+- **Chained derived fields**: One derived field cannot be used to define another derived field.
+- **Join field type**: Derived fields are not supported for the [join field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/join/).
+- **Concurrent segment search**: Derived fields are not supported for [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
+
+We are planning to address these limitations in future versions.
+
+## Prerequisites
+
+Before using a derived field, be sure to satisfy the following prerequisites:
+
+- **Enable `_source` or `doc_values`**: Ensure that either the `_source` field or doc values is enabled for the fields used in your script.
+- **Enable expensive queries**: Ensure that [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `true`.
+- **Feature control**: Derived fields are enabled by default. You can enable or disable derived fields by using the following settings:
+ - **Index level**: Update the `index.query.derived_field.enabled` setting.
+ - **Cluster level**: Update the `search.derived_field.enabled` setting.
+ Both settings are dynamic, so they can be changed without reindexing or node restarts.
+- **Performance considerations**: Before using derived fields, evaluate the [performance implications](#performance) to ensure that derived fields meet your scale requirements.
+
+## Defining derived fields
+
+You can define derived fields [in index mappings](#defining-derived-fields-in-index-mappings) or [directly within a search request](#defining-and-searching-derived-fields-in-a-search-request).
+
+## Example setup
+
+To try the examples on this page, first create the following `logs` index:
+
+```json
+PUT logs
+{
+ "mappings": {
+ "properties": {
+ "request": {
+ "type": "text",
+ "fields": {
+ "keyword": {
+ "type": "keyword"
+ }
+ }
+ },
+ "client_ip": {
+ "type": "keyword"
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+Add sample documents to the index:
+
+```json
+POST _bulk
+{ "index" : { "_index" : "logs", "_id" : "1" } }
+{ "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778", "clientip": "61.177.2.0" }
+{ "index" : { "_index" : "logs", "_id" : "2" } }
+{ "request": "894140400 GET /french/playing/mascot/mascot.html HTTP/1.1 200 5474", "clientip": "185.92.2.0" }
+{ "index" : { "_index" : "logs", "_id" : "3" } }
+{ "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711", "clientip": "61.177.2.0" }
+{ "index" : { "_index" : "logs", "_id" : "4" } }
+{ "request": "894360400 POST /images/home_fr_button.gif HTTP/1.1 200 2140", "clientip": "129.178.2.0" }
+{ "index" : { "_index" : "logs", "_id" : "5" } }
+{ "request": "894470400 DELETE /images/102384s.gif HTTP/1.0 200 785", "clientip": "227.177.2.0" }
+```
+{% include copy-curl.html %}
+
+## Defining derived fields in index mappings
+
+To derive the `timestamp`, `method`, and `size` fields from the `request` field indexed in the `logs` index, configure the following mappings:
+
+```json
+PUT /logs/_mapping
+{
+ "derived": {
+ "timestamp": {
+ "type": "date",
+ "format": "MM/dd/yyyy",
+ "script": {
+ "source": """
+ emit(Long.parseLong(doc["request.keyword"].value.splitOnToken(" ")[0]))
+ """
+ }
+ },
+ "method": {
+ "type": "keyword",
+ "script": {
+ "source": """
+ emit(doc["request.keyword"].value.splitOnToken(" ")[1])
+ """
+ }
+ },
+ "size": {
+ "type": "long",
+ "script": {
+ "source": """
+ emit(Long.parseLong(doc["request.keyword"].value.splitOnToken(" ")[5]))
+ """
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+Note that the `timestamp` field has an additional `format` parameter that specifies the format in which to display `date` fields. If you don't include a `format` parameter, then the format defaults to `strict_date_time_no_millis`. For more information about supported date formats, see [Parameters](#parameters).
+
+## Parameters
+
+The following table lists the parameters accepted by `derived` field types. All parameters are dynamic and can be modified without reindexing documents.
+
+| Parameter | Required/Optional | Description |
+| :--- | :--- | :--- |
+| `type` | Required | The type of the derived field. Supported types are `boolean`, `date`, `geo_point`, `ip`, `keyword`, `text`, `long`, `double`, `float`, and `object`. |
+| `script` | Required | The script associated with the derived field. Any value emitted from the script must be emitted using `emit()`. The type of the emitted value must match the `type` of the derived field. Scripts have access to both the `doc_values` and `_source` fields if those are enabled. The doc value of a field can be accessed using `doc['field_name'].value`, and the source can be accessed using `params._source["field_name"]`. |
+| `format` | Optional | The format used for parsing dates. Only applicable to `date` fields. Valid values are `strict_date_time_no_millis`, `strict_date_optional_time`, and `epoch_millis`. For more information, see [Formats]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats).|
+| `ignore_malformed`| Optional | A Boolean value that specifies whether to ignore malformed values when running a query on a derived field. Default value is `false` (throw an exception when encountering malformed values). |
+| `prefilter_field` | Optional | An indexed text field provided to boost the performance of derived fields. Specifies an existing indexed field on which to filter prior to filtering on the derived field. For more information, see [Prefilter field](#prefilter-field). |
+
+## Emitting values in scripts
+
+The `emit()` function is available only within the derived field script context. It is used to emit one or multiple (for a multi-valued field) script values for a document on which the script runs.
+
+The following table lists the `emit()` function formats for the supported field types.
+
+| Type | Emit format | Multi-valued fields supported|
+|-----------|----------------------------------|--------------|
+| `boolean` | `emit(boolean)` | No |
+| `double` | `emit(double)` | Yes |
+| `date` | `emit(long timeInMilis)` | Yes |
+| `float` | `emit(float)` | Yes |
+| `geo_point`| `emit(double lat, double lon)` | Yes |
+| `ip` | `emit(String ip)` | Yes |
+| `keyword` | `emit(String)` | Yes |
+| `long` | `emit(long)` | Yes |
+| `object` | `emit(String json)` (valid JSON) | Yes |
+| `text` | `emit(String)` | Yes |
+
+By default, a type mismatch between a derived field and its emitted value will result in the search request failing with an error. If `ignore_malformed` is set to `true`, then the failing document is skipped and the search request succeeds.
+{: .note}
+
+The size limit of the emitted values is 1 MB per document.
+{: .important}
+
+## Searching derived fields defined in index mappings
+
+To search derived fields, use the same syntax as when searching regular fields. For example, the following request searches for documents with derived `timestamp` field in the specified range:
+
+```json
+POST /logs/_search
+{
+ "query": {
+ "range": {
+ "timestamp": {
+ "gte": "1970-01-11T08:20:30.400Z",
+ "lte": "1970-01-11T08:26:00.400Z"
+ }
+ }
+ },
+ "fields": ["timestamp"]
+}
+```
+{% include copy-curl.html %}
+
+The response contains the matching documents:
+
+
+
+ Response
+
+ {: .text-delta}
+
+```json
+{
+ "took": 315,
+ "timed_out": false,
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 4,
+ "relation": "eq"
+ },
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "logs",
+ "_id": "1",
+ "_score": 1,
+ "_source": {
+ "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778",
+ "clientip": "61.177.2.0"
+ },
+ "fields": {
+ "timestamp": [
+ "1970-01-11T08:20:30.400Z"
+ ]
+ }
+ },
+ {
+ "_index": "logs",
+ "_id": "2",
+ "_score": 1,
+ "_source": {
+ "request": "894140400 GET /french/playing/mascot/mascot.html HTTP/1.1 200 5474",
+ "clientip": "185.92.2.0"
+ },
+ "fields": {
+ "timestamp": [
+ "1970-01-11T08:22:20.400Z"
+ ]
+ }
+ },
+ {
+ "_index": "logs",
+ "_id": "3",
+ "_score": 1,
+ "_source": {
+ "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711",
+ "clientip": "61.177.2.0"
+ },
+ "fields": {
+ "timestamp": [
+ "1970-01-11T08:24:10.400Z"
+ ]
+ }
+ },
+ {
+ "_index": "logs",
+ "_id": "4",
+ "_score": 1,
+ "_source": {
+ "request": "894360400 POST /images/home_fr_button.gif HTTP/1.1 200 2140",
+ "clientip": "129.178.2.0"
+ },
+ "fields": {
+ "timestamp": [
+ "1970-01-11T08:26:00.400Z"
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+
+## Defining and searching derived fields in a search request
+
+You can also define derived fields directly in a search request and query them along with regular indexed fields. For example, the following request creates the `url` and `status` derived fields and searches those fields along with the regular `request` and `clientip` fields:
+
+```json
+POST /logs/_search
+{
+ "derived": {
+ "url": {
+ "type": "text",
+ "script": {
+ "source": """
+ emit(doc["request"].value.splitOnToken(" ")[2])
+ """
+ }
+ },
+ "status": {
+ "type": "keyword",
+ "script": {
+ "source": """
+ emit(doc["request"].value.splitOnToken(" ")[4])
+ """
+ }
+ }
+ },
+ "query": {
+ "bool": {
+ "must": [
+ {
+ "term": {
+ "clientip": "61.177.2.0"
+ }
+ },
+ {
+ "match": {
+ "url": "images"
+ }
+ },
+ {
+ "term": {
+ "status": "200"
+ }
+ }
+ ]
+ }
+ },
+ "fields": ["request", "clientip", "url", "status"]
+}
+```
+{% include copy-curl.html %}
+
+The response contains the matching documents:
+
+
+
+ Response
+
+ {: .text-delta}
+
+```json
+{
+ "took": 6,
+ "timed_out": false,
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 2,
+ "relation": "eq"
+ },
+ "max_score": 2.8754687,
+ "hits": [
+ {
+ "_index": "logs",
+ "_id": "1",
+ "_score": 2.8754687,
+ "_source": {
+ "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778",
+ "clientip": "61.177.2.0"
+ },
+ "fields": {
+ "request": [
+ "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778"
+ ],
+ "clientip": [
+ "61.177.2.0"
+ ],
+ "url": [
+ "/english/images/france98_venues.gif"
+ ],
+ "status": [
+ "200"
+ ]
+ }
+ },
+ {
+ "_index": "logs",
+ "_id": "3",
+ "_score": 2.8754687,
+ "_source": {
+ "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711",
+ "clientip": "61.177.2.0"
+ },
+ "fields": {
+ "request": [
+ "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711"
+ ],
+ "clientip": [
+ "61.177.2.0"
+ ],
+ "url": [
+ "/english/venues/images/venue_header.gif"
+ ],
+ "status": [
+ "200"
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+
+Derived fields use the default analyzer specified in the index analysis settings during search. You can override the default analyzer or specify a search analyzer within a search request in the same way as with regular fields. For more information, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/).
+{: .note}
+
+When both an index mapping and a search definition are present for a field, the search definition takes precedence.
+{: .note}
+
+### Retrieving fields
+
+You can retrieve derived fields using the `fields` parameter in the search request in the same way as with regular fields, as shown in the preceding examples. You can also use wildcards to retrieve all derived fields that match a given pattern.
+
+### Highlighting
+
+Derived fields of type `text` support highlighting using the [unified highlighter]({{site.url}}{{site.baseurl}}/opensearch/search/highlight#the-unified-highlighter). For example, the following request specifies to highlight the derived `url` field:
+
+```json
+POST /logs/_search
+{
+ "derived": {
+ "url": {
+ "type": "text",
+ "script": {
+ "source": """
+ emit(doc["request"].value.splitOnToken(" " )[2])
+ """
+ }
+ }
+ },
+ "query": {
+ "bool": {
+ "must": [
+ {
+ "term": {
+ "clientip": "61.177.2.0"
+ }
+ },
+ {
+ "match": {
+ "url": "images"
+ }
+ }
+ ]
+ }
+ },
+ "fields": ["request", "clientip", "url"],
+ "highlight": {
+ "fields": {
+ "url": {}
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+The response specifies highlighting in the `url` field:
+
+
+
+ Response
+
+ {: .text-delta}
+
+```json
+{
+ "took": 45,
+ "timed_out": false,
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 2,
+ "relation": "eq"
+ },
+ "max_score": 1.8754687,
+ "hits": [
+ {
+ "_index": "logs",
+ "_id": "1",
+ "_score": 1.8754687,
+ "_source": {
+ "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778",
+ "clientip": "61.177.2.0"
+ },
+ "fields": {
+ "request": [
+ "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778"
+ ],
+ "clientip": [
+ "61.177.2.0"
+ ],
+ "url": [
+ "/english/images/france98_venues.gif"
+ ]
+ },
+ "highlight": {
+ "url": [
+ "/english/images/france98_venues.gif"
+ ]
+ }
+ },
+ {
+ "_index": "logs",
+ "_id": "3",
+ "_score": 1.8754687,
+ "_source": {
+ "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711",
+ "clientip": "61.177.2.0"
+ },
+ "fields": {
+ "request": [
+ "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711"
+ ],
+ "clientip": [
+ "61.177.2.0"
+ ],
+ "url": [
+ "/english/venues/images/venue_header.gif"
+ ]
+ },
+ "highlight": {
+ "url": [
+ "/english/venues/images/venue_header.gif"
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+
+## Performance
+
+Derived fields are not indexed but are computed dynamically by retrieving values from the `_source` field or doc values. Thus, they run more slowly. To improve performance, try the following:
+
+- Prune the search space by adding query filters on indexed fields in conjunction with derived fields.
+- Use doc values instead of `_source` in the script for faster access, whenever applicable.
+- Consider using a [`prefilter_field`](#prefilter-field) to automatically prune the search space without explicit filters in the search request.
+
+### Prefilter field
+
+Specifying a prefilter field helps to prune the search space without adding explicit filters in the search request. The prefilter field specifies an existing indexed field (`prefilter_field`) on which to filter automatically when constructing the query. The `prefilter_field` must be a text field (either [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) or [`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/)).
+
+For example, you can add a `prefilter_field` to the `method` derived field. Update the index mapping, specifying to prefilter on the `request` field:
+
+```json
+PUT /logs/_mapping
+{
+ "derived": {
+ "method": {
+ "type": "keyword",
+ "script": {
+ "source": """
+ emit(doc["request.keyword"].value.splitOnToken(" ")[1])
+ """
+ },
+ "prefilter_field": "request"
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+Now search using a query on the `method` derived field:
+
+```json
+POST /logs/_search
+{
+ "profile": true,
+ "query": {
+ "term": {
+ "method": {
+ "value": "GET"
+ }
+ }
+ },
+ "fields": ["method"]
+}
+```
+{% include copy-curl.html %}
+
+OpenSearch automatically adds a filter on the `request` field to your query:
+
+```json
+"#request:GET #DerivedFieldQuery (Query: [ method:GET])"
+```
+
+You can use the `profile` option to analyze derived field performance, as shown in the preceding example.
+{: .tip}
+
+## Derived object fields
+
+A script can emit a valid JSON object so that you can query subfields without indexing them, in the same way as with regular fields. This is useful for large JSON objects that require occasional searches on some subfields. In this case, indexing the subfields is expensive, while defining derived fields for each subfield also adds a lot of resource overhead. If you don't [explicitly provide the subfield type](#explicit-subfield-type), then the subfield type is [inferred](#inferred-subfield-type).
+
+For example, the following request defines a `derived_request_object` derived field as an `object` type:
+
+```json
+PUT logs_object
+{
+ "mappings": {
+ "properties": {
+ "request_object": { "type": "text" }
+ },
+ "derived": {
+ "derived_request_object": {
+ "type": "object",
+ "script": {
+ "source": "emit(params._source[\"request_object\"])"
+ }
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+Consider the following documents, in which the `request_object` is a string representation of a JSON object:
+
+```json
+POST _bulk
+{ "index" : { "_index" : "logs_object", "_id" : "1" } }
+{ "request_object": "{\"@timestamp\": 894030400, \"clientip\":\"61.177.2.0\", \"request\": \"GET /english/venues/images/venue_header.gif HTTP/1.0\", \"status\": 200, \"size\": 711}" }
+{ "index" : { "_index" : "logs_object", "_id" : "2" } }
+{ "request_object": "{\"@timestamp\": 894140400, \"clientip\":\"129.178.2.0\", \"request\": \"GET /images/home_fr_button.gif HTTP/1.1\", \"status\": 200, \"size\": 2140}" }
+{ "index" : { "_index" : "logs_object", "_id" : "3" } }
+{ "request_object": "{\"@timestamp\": 894240400, \"clientip\":\"227.177.2.0\", \"request\": \"GET /images/102384s.gif HTTP/1.0\", \"status\": 400, \"size\": 785}" }
+{ "index" : { "_index" : "logs_object", "_id" : "4" } }
+{ "request_object": "{\"@timestamp\": 894340400, \"clientip\":\"61.177.2.0\", \"request\": \"GET /english/images/venue_bu_city_on.gif HTTP/1.0\", \"status\": 400, \"size\": 1397}\n" }
+{ "index" : { "_index" : "logs_object", "_id" : "5" } }
+{ "request_object": "{\"@timestamp\": 894440400, \"clientip\":\"132.176.2.0\", \"request\": \"GET /french/news/11354.htm HTTP/1.0\", \"status\": 200, \"size\": 3460, \"is_active\": true}" }
+```
+{% include copy-curl.html %}
+
+The following query searches the `@timestamp` subfield of the `derived_request_object`:
+
+```json
+POST /logs_object/_search
+{
+ "query": {
+ "range": {
+ "derived_request_object.@timestamp": {
+ "gte": "894030400",
+ "lte": "894140400"
+ }
+ }
+ },
+ "fields": ["derived_request_object.@timestamp"]
+}
+```
+{% include copy-curl.html %}
+
+The response contains the matching documents:
+
+
+
+ Response
+
+ {: .text-delta}
+
+```json
+{
+ "took": 26,
+ "timed_out": false,
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 2,
+ "relation": "eq"
+ },
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "logs_object",
+ "_id": "1",
+ "_score": 1,
+ "_source": {
+ "request_object": """{"@timestamp": 894030400, "clientip":"61.177.2.0", "request": "GET /english/venues/images/venue_header.gif HTTP/1.0", "status": 200, "size": 711}"""
+ },
+ "fields": {
+ "derived_request_object.@timestamp": [
+ 894030400
+ ]
+ }
+ },
+ {
+ "_index": "logs_object",
+ "_id": "2",
+ "_score": 1,
+ "_source": {
+ "request_object": """{"@timestamp": 894140400, "clientip":"129.178.2.0", "request": "GET /images/home_fr_button.gif HTTP/1.1", "status": 200, "size": 2140}"""
+ },
+ "fields": {
+ "derived_request_object.@timestamp": [
+ 894140400
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+
+
+You can also specify to highlight a derived object field:
+
+```json
+POST /logs_object/_search
+{
+ "query": {
+ "bool": {
+ "must": [
+ {
+ "term": {
+ "derived_request_object.clientip": "61.177.2.0"
+ }
+ },
+ {
+ "match": {
+ "derived_request_object.request": "images"
+ }
+ }
+ ]
+ }
+ },
+ "fields": ["derived_request_object.*"],
+ "highlight": {
+ "fields": {
+ "derived_request_object.request": {}
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+The response adds highlighting to the `derived_request_object.request` field:
+
+
+
+ Response
+
+ {: .text-delta}
+
+```json
+{
+ "took": 5,
+ "timed_out": false,
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 2,
+ "relation": "eq"
+ },
+ "max_score": 2,
+ "hits": [
+ {
+ "_index": "logs_object",
+ "_id": "1",
+ "_score": 2,
+ "_source": {
+ "request_object": """{"@timestamp": 894030400, "clientip":"61.177.2.0", "request": "GET /english/venues/images/venue_header.gif HTTP/1.0", "status": 200, "size": 711}"""
+ },
+ "fields": {
+ "derived_request_object.request": [
+ "GET /english/venues/images/venue_header.gif HTTP/1.0"
+ ],
+ "derived_request_object.clientip": [
+ "61.177.2.0"
+ ]
+ },
+ "highlight": {
+ "derived_request_object.request": [
+ "GET /english/venues/images/venue_header.gif HTTP/1.0"
+ ]
+ }
+ },
+ {
+ "_index": "logs_object",
+ "_id": "4",
+ "_score": 2,
+ "_source": {
+ "request_object": """{"@timestamp": 894340400, "clientip":"61.177.2.0", "request": "GET /english/images/venue_bu_city_on.gif HTTP/1.0", "status": 400, "size": 1397}
+"""
+ },
+ "fields": {
+ "derived_request_object.request": [
+ "GET /english/images/venue_bu_city_on.gif HTTP/1.0"
+ ],
+ "derived_request_object.clientip": [
+ "61.177.2.0"
+ ]
+ },
+ "highlight": {
+ "derived_request_object.request": [
+ "GET /english/images/venue_bu_city_on.gif HTTP/1.0"
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+
+
+### Inferred subfield type
+
+Type inference is based on the same logic as [Dynamic mapping]({{site.url}}{{site.baseurl}}/opensearch/mappings#dynamic-mapping). Instead of inferring the subfield type from the first document, a random sample of documents is used to infer the type. If the subfield isn't found in any documents from the random sample, type inference fails and logs a warning. For subfields that seldom occur in documents, consider defining the explicit field type. Using dynamic type inference for such subfields may result in a query returning no results, like for a missing field.
+
+### Explicit subfield type
+
+To define the explicit subfield type, provide the `type` parameter in the `properties` object. In the following example, the `derived_logs_object.is_active` field is defined as `boolean`. Because this field is only present in one of the documents, its type inference might fail, so it's important to define the explicit type:
+
+```json
+POST /logs_object/_search
+{
+ "derived": {
+ "derived_request_object": {
+ "type": "object",
+ "script": {
+ "source": "emit(params._source[\"request_object\"])"
+ },
+ "properties": {
+ "is_active": "boolean"
+ }
+ }
+ },
+ "query": {
+ "term": {
+ "derived_request_object.is_active": true
+ }
+ },
+ "fields": ["derived_request_object.is_active"]
+}
+```
+{% include copy-curl.html %}
+
+The response contains the matching documents:
+
+
+
+ Response
+
+ {: .text-delta}
+
+```json
+{
+ "took": 13,
+ "timed_out": false,
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 1,
+ "relation": "eq"
+ },
+ "max_score": 1,
+ "hits": [
+ {
+ "_index": "logs_object",
+ "_id": "5",
+ "_score": 1,
+ "_source": {
+ "request_object": """{"@timestamp": 894440400, "clientip":"132.176.2.0", "request": "GET /french/news/11354.htm HTTP/1.0", "status": 200, "size": 3460, "is_active": true}"""
+ },
+ "fields": {
+ "derived_request_object.is_active": [
+ true
+ ]
+ }
+ }
+ ]
+ }
+}
+```
+
+
diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md
index 69ca0032be..be0963e976 100644
--- a/_field-types/supported-field-types/index.md
+++ b/_field-types/supported-field-types/index.md
@@ -29,6 +29,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`).
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
+Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
## Arrays
diff --git a/_ml-commons-plugin/cluster-settings.md b/_ml-commons-plugin/cluster-settings.md
index c473af81a1..ebc9b92531 100644
--- a/_ml-commons-plugin/cluster-settings.md
+++ b/_ml-commons-plugin/cluster-settings.md
@@ -303,12 +303,12 @@ This setting automatically redeploys deployed or partially deployed models upon
### Setting
```
-plugins.ml_commons.model_auto_redeploy.enable: false
+plugins.ml_commons.model_auto_redeploy.enable: true
```
### Values
-- Default value: false
+- Default value: true
- Valid values: `false`, `true`
## Set retires for auto redeploy
diff --git a/_search-plugins/knn/performance-tuning.md b/_search-plugins/knn/performance-tuning.md
index d2cf3c7759..24d92bd67d 100644
--- a/_search-plugins/knn/performance-tuning.md
+++ b/_search-plugins/knn/performance-tuning.md
@@ -18,11 +18,11 @@ This topic also provides recommendations for comparing approximate k-NN to exact
## Indexing performance tuning
-Take the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once:
+Take any of the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once.
-* **Disable the refresh interval**
+### Disable the refresh interval
- Either disable the refresh interval (default = 1 sec), or set a long duration for the refresh interval to avoid creating multiple small segments:
+Either disable the refresh interval (default = 1 sec) or set a long duration for the refresh interval to avoid creating multiple small segments:
```json
PUT //_settings
@@ -32,23 +32,78 @@ Take the following steps to improve indexing performance, especially when you pl
}
}
```
- **Note**: Make sure to reenable `refresh_interval` after indexing finishes.
-* **Disable replicas (no OpenSearch replica shard)**
+Make sure to reenable `refresh_interval` after indexing is complete.
- Set replicas to `0` to prevent duplicate construction of native library indexes in both primary and replica shards. When you enable replicas after indexing finishes, the serialized native library indexes are directly copied. If you have no replicas, losing nodes might cause data loss, so it's important that the data lives elsewhere so this initial load can be retried in case of an issue.
+### Disable replicas (no OpenSearch replica shard)
-* **Increase the number of indexing threads**
+ Set replicas to `0` to prevent duplicate construction of native library indexes in both primary and replica shards. When you enable replicas after indexing completes, the serialized native library indexes are copied directly. If you have no replicas, losing nodes might cause data loss, so it's important that the data be stored elsewhere so that this initial load can be retried in the event of an issue.
- If the hardware you choose has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting.
+### Increase the number of indexing threads
- Keep an eye on CPU utilization and choose the correct number of threads. Because native library index construction is costly, having multiple threads can cause additional CPU load.
+If your hardware has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting.
+
+Monitor CPU utilization and choose the correct number of threads. Because native library index construction is costly, choosing more threads then you need can cause additional CPU load.
+
+
+### (Expert-level) Disable vector field storage in the source field
+
+The `_source` field contains the original JSON document body that was passed at index time. This field is not indexed and is not searchable but is stored so that it can be returned when executing fetch requests such as `get` and `search`. When using vector fields within the source, you can remove the vector field to save disk space, as shown in the following example where the `location` vector is excluded:
+
+ ```json
+ PUT //_mappings
+ {
+ "_source": {
+ "excludes": ["location"]
+ },
+ "properties": {
+ "location": {
+ "type": "knn_vector",
+ "dimension": 2,
+ "method": {
+ "name": "hnsw",
+ "space_type": "l2",
+ "engine": "faiss"
+ }
+ }
+ }
+ }
+ ```
+
+
+Disabling the `_source` field can cause certain features to become unavailable, such as the `update`, `update_by_query`, and `reindex` APIs and the ability to debug queries or aggregations by using the original document at index time.
+
+In OpenSearch 2.15 or later, you can further improve indexing speed and reduce disk space by removing the vector field from the `_recovery_source`, as shown in the following example:
+
+ ```json
+ PUT //_mappings
+ {
+ "_source": {
+ "excludes": ["location"],
+ "recovery_source_excludes": ["location"]
+ },
+ "properties": {
+ "location": {
+ "type": "knn_vector",
+ "dimension": 2,
+ "method": {
+ "name": "hnsw",
+ "space_type": "l2",
+ "engine": "faiss"
+ }
+ }
+ }
+ }
+ ```
+
+This is an expert-level setting. Disabling the `_recovery_source` may lead to failures during peer-to-peer recovery. Before disabling the `_recovery_source`, check with your OpenSearch cluster admin to determine whether your cluster performs regular flushes before starting the peer-to-peer recovery of shards before disabling the `_recovery_source`.
+{: .warning}
## Search performance tuning
Take the following steps to improve search performance:
-* **Reduce segment count**
+### Reduce segment count
To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results.
@@ -56,7 +111,7 @@ Take the following steps to improve search performance:
You can control the number of segments by choosing a larger refresh interval, or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval.
-* **Warm up the index**
+### Warm up the index
Native library indexes are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segments at a shard level (higher score = better result).
@@ -77,13 +132,15 @@ Take the following steps to improve search performance:
The warmup API operation loads all native library indexes for all shards (primary and replica) for the specified indexes into the cache, so there's no penalty to load native library indexes during initial searches.
- **Note**: This API operation only loads the segments of the indexes it ***sees*** into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indexes into memory.
+This API operation only loads the segments of active indexes into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indexes into memory.
+{: .warning}
+
-* **Avoid reading stored fields**
+### Avoid reading stored fields
If your use case is simply to read the IDs and scores of the nearest neighbors, you can disable reading stored fields, which saves time retrieving the vectors from stored fields.
-* **Use `mmap` file I/O**
+### Use `mmap` file I/O
For the Lucene-based approximate k-NN search, there is no dedicated cache layer that speeds up read/write operations. Instead, the plugin relies on the existing caching mechanism in OpenSearch core. In versions 2.4 and earlier of the Lucene-based approximate k-NN search, read/write operations were based on Java NIO by default, which can be slow, depending on the Lucene version and number of segments per shard. Starting with version 2.5, k-NN enables [`mmap`](https://en.wikipedia.org/wiki/Mmap) file I/O by default when the store type is `hybridfs` (the default store type in OpenSearch). This leads to fast file I/O operations and improves the overall performance of both data ingestion and search. The two file extensions specific to vector values that use `mmap` are `.vec` and `.vem`. For more information about these file extensions, see [the Lucene documentation](https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsFormat.html).
diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md
index e22c74596f..b2b4fc33d6 100644
--- a/_search-plugins/neural-sparse-search.md
+++ b/_search-plugins/neural-sparse-search.md
@@ -31,6 +31,7 @@ To use neural sparse search, follow these steps:
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search).
+1. _Optional_ [Create and enable the two-phase processor](#step-5-create-and-enable-the-two-phase-processor-optional).
## Step 1: Create an ingest pipeline
@@ -262,6 +263,38 @@ GET my-nlp-index/_search
}
}
```
+## Step 5: Create and enable the two-phase processor (Optional)
+
+
+The `neural_sparse_two_phase_processor` is a new feature introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries.
+
+To quickly launch a search pipeline with neural sparse search, use the following example pipeline:
+
+```json
+PUT /_search/pipeline/two_phase_search_pipeline
+{
+ "request_processors": [
+ {
+ "neural_sparse_two_phase_processor": {
+ "tag": "neural-sparse",
+ "description": "This processor is making two-phase processor."
+ }
+ }
+ ]
+}
+```
+{% include copy-curl.html %}
+
+Then choose the index you want to configure with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example:
+```json
+PUT /index-name/_settings
+{
+ "index.search.default_pipeline" : "two_phase_search_pipeline"
+}
+```
+{% include copy-curl.html %}
+
+
## Setting a default model on an index or field
diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md
new file mode 100644
index 0000000000..53d69c1cc2
--- /dev/null
+++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md
@@ -0,0 +1,150 @@
+---
+layout: default
+title: Neural spare query two-phase processor
+nav_order: 13
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# Neural sparse query two-phase processor
+Introduced 2.15
+{: .label .label-purple }
+
+The `neural_sparse_two_phase_processor` search processor is designed to provide faster search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by dividing the original method of scoring all documents with all tokens into two steps:
+
+1. High-weight tokens score the documents and filter out the top documents.
+2. Low-weight tokens rescore the top documents.
+
+## Request fields
+
+The following table lists all available request fields.
+
+Field | Data type | Description
+:--- | :--- | :---
+`enabled` | Boolean | Controls whether the two-phase processor is enabled. Default is `true`.
+`two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional.
+`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's maximum score multiplied by its `prune_ratio`. Valid range is [0,1]. Default is `0.4`
+`two_phase_parameter.expansion_rate` | Float | The rate at which documents will be fine-tuned during the second phase. The second-phase document number equals the query size (default is 10) multiplied by its expansion rate. Valid range is greater than 1.0. Default is `5.0`
+`two_phase_parameter.max_window_size` | Int | The maximum number of documents that can be processed using the two-phase processor. Valid range is greater than 50. Default is `10000`.
+`tag` | String | The processor's identifier. Optional.
+`description` | String | A description of the processor. Optional.
+
+## Example
+
+The following example creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor.
+
+### Create search pipeline
+
+The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. The processor sets a custom model ID at the index level and provides different default model IDs for two specific index fields:
+
+```json
+PUT /_search/pipeline/two_phase_search_pipeline
+{
+ "request_processors": [
+ {
+ "neural_sparse_two_phase_processor": {
+ "tag": "neural-sparse",
+ "description": "This processor is making two-phase processor.",
+ "enabled": true,
+ "two_phase_parameter": {
+ "prune_ratio": custom_prune_ratio,
+ "expansion_rate": custom_expansion_rate,
+ "max_window_size": custom_max_window_size
+ }
+ }
+ }
+ ]
+}
+```
+{% include copy-curl.html %}
+
+### Set search pipeline
+
+After the two-phase pipeline is created, set the `index.search.default_pipeline` setting to the name of the pipeline for the index on which you want to use the two-phase pipeline:
+
+```json
+PUT /index-name/_settings
+{
+ "index.search.default_pipeline" : "two_phase_search_pipeline"
+}
+```
+{% include copy-curl.html %}
+
+## Limitation
+
+The `neural_sparse_two_phase_processor` has the following limitations.
+
+### Version support
+
+The `neural_sparse_two_phase_processor` can only be used with OpenSearch 2.15 or later.
+
+### Compound query support
+
+As of OpenSearch 2.15, only the Boolean [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/) is supported.
+
+Neural sparse queries and Boolean queries with a boost parameter (not boosting queries) are also supported.
+
+## Examples
+
+The following examples show neural sparse queries with the supported query types.
+
+### Single neural sparse query
+
+```
+GET /my-nlp-index/_search
+{
+ "query": {
+ "neural_sparse": {
+ "passage_embedding": {
+ "query_text": "Hi world"
+ "model_id":
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+### Neural sparse query nested in a Boolean query
+
+```
+GET /my-nlp-index/_search
+{
+ "query": {
+ "bool": {
+ "should": [
+ {
+ "neural_sparse": {
+ "passage_embedding": {
+ "query_text": "Hi world",
+ "model_id":
+ },
+ "boost": 2.0
+ }
+ }
+ ]
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+## P99 latency metrics
+Using an OpenSearch cluster set up on three m5.4xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances, OpenSearch conducts neural sparse query P99 latency tests on indexes corresponding to more than 10 datasets.
+
+### Doc-only mode latency metric
+
+In doc-only mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics:
+
+- Average latency without the two-phase processor: 53.56 ms
+- Average latency with the two-phase processor: 38.61 ms
+
+This results in an overall latency reduction of approximately 27.92%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 5.14 to 84.6%. The specific latency optimization values depend on the data distribution within the indexes.
+
+### Bi-encoder mode latency metric
+
+In bi-encoder mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics:
+- Average latency without the two-phase processor: 300.79 ms
+- Average latency with the two-phase processor: 121.64 ms
+
+This results in an overall latency reduction of approximately 59.56%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 1.56 to 82.84%. The specific latency optimization values depend on the data distribution within the indexes.
diff --git a/_tools/logstash/common-filters.md b/_tools/logstash/common-filters.md
index 4cfc2b6703..909461261d 100644
--- a/_tools/logstash/common-filters.md
+++ b/_tools/logstash/common-filters.md
@@ -73,7 +73,7 @@ Logstash supports a few common options for all filter plugins:
Option | Description
:--- | :---
`add_field` | Adds one or more fields to the event.
-`remove_field` | Removes one or more events from the field.
+`remove_field` | Removes one or more fields from the event.
`add_tag` | Adds one or more tags to the event. You can use tags to perform conditional processing on events depending on which tags they contain.
`remove_tag` | Removes one or more tags from the event.
diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md b/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md
index 7cc533fe76..d967aca914 100644
--- a/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md
+++ b/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md
@@ -54,12 +54,43 @@ In addition to the mandatory static settings, you can configure the following dy
Setting | Default | Description
:--- | :--- | :---
-`cluster.remote_store.state.index_metadata.upload_timeout` | 20s | The amount of time to wait for index metadata upload to complete. Note that index metadata for separate indexes is uploaded in parallel.
-`cluster.remote_store.state.global_metadata.upload_timeout` | 20s | The amount of time to wait for global metadata upload to complete. Global metadata contains globally applicable metadata, such as templates, cluster settings, data stream metadata, and repository metadata.
-`cluster.remote_store.state.metadata_manifest.upload_timeout` | 20s | The amount of time to wait for the manifest file upload to complete. The manifest file contains the details of each of the files uploaded for a single cluster state, both index metadata files and global metadata files.
+`cluster.remote_store.state.index_metadata.upload_timeout` | 20s | Deprecated. Use `cluster.remote_store.state.global_metadata.upload_timeout` instead.
+`cluster.remote_store.state.global_metadata.upload_timeout` | 20s | The amount of time to wait for the cluster state upload to complete.
+`cluster.remote_store.state.metadata_manifest.upload_timeout` | 20s | The amount of time to wait for the manifest file upload to complete. The manifest file contains the details of each of the files uploaded for a single cluster state, both index metadata files and global metadata files.
+`cluster.remote_store.state.cleanup_interval` | 300s | The interval at which the asynchronous remote state clean-up task runs. This task deletes any old remote state files.
## Limitations
The remote cluster state functionality has the following limitations:
- Unsafe bootstrap scripts cannot be run when the remote cluster state is enabled. When a majority of cluster-manager nodes are lost and the cluster goes down, the user needs to replace any remaining cluster manager nodes and reseed the nodes in order to bootstrap a new cluster.
+
+## Remote cluster state publication
+
+
+The cluster manager node processes updates to the cluster state. It then publishes the updated cluster state through the local transport layer to all of the follower nodes. With the `remote_store.publication` feature enabled, the cluster state is backed up to the remote store during every state update. The follower nodes can then fetch the state from the remote store directly, which reduces the overhead on the cluster manager node for publication.
+
+To enable the feature flag for the `remote_store.publication` feature, follow the steps in the [experimental feature flag documentation]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
+
+Enabling the setting does not change the publication flow, and follower nodes will not send acknowledgements back to the cluster manager node
+until they download the updated cluster state from the remote store.
+
+You must enable the remote cluster state feature in order for remote publication to work. To modify the remote publication behavior, the following routing table repository settings can be used, which contain the shard allocation details for each index in the remote cluster state:
+
+```yml
+# Remote routing table repository settings
+node.attr.remote_store.routing_table.repository: my-remote-routing-table-repo
+node.attr.remote_store.repository.my-remote-routing-table-repo.type: s3
+node.attr.remote_store.repository.my-remote-routing-table-repo.settings.bucket:
+node.attr.remote_store.repository.my-remote-routing-table-repo.settings.region:
+```
+
+You do not have to use different remote store repositories for state and routing because both state and routing can use the same repository settings.
+
+To configure remote publication, use the following cluster settings.
+
+Setting | Default | Description
+:--- | :--- | :---
+`cluster.remote_store.state.read_timeout` | 20s | The amount of time to wait for remote state download to complete on the follower node.
+`cluster.remote_store.routing_table.path_type` | HASHED_PREFIX | The path type to be used for creating an index routing path in the blob store. Valid values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`.
+`cluster.remote_store.routing_table.path_hash_algo` | FNV_1A_BASE64 | The algorithm to be used for constructing the prefix or infix of the blob store path. This setting is applied if `cluster.remote_store.routing_table.path_type` is `hashed_prefix` or `hashed_infix`. Valid algorithm values are `FNV_1A_BASE64` and `FNV_1A_COMPOSITE_1`.
diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
index 257db00db1..f35115c95f 100644
--- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
+++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
@@ -207,7 +207,7 @@ You will most likely not need to specify any parameters except for `location`. F
You will most likely not need to specify any parameters except for `bucket` and `base_path`. For allowed request parameters, see [Register or update snapshot repository API](https://opensearch.org/docs/latest/api-reference/snapshots/create-repository/).
-### Registering an Azure storage account
+### Registering a Microsoft Azure storage account using Helm
Use the following steps to register a snapshot repository backed by an Azure storage account for an OpenSearch cluster deployed using Helm.
@@ -296,6 +296,56 @@ Use the following steps to register a snapshot repository backed by an Azure sto
}
```
+### Set up Microsoft Azure Blob Storage
+
+To use Azure Blob Storage as a snapshot repository, follow these steps:
+1. Install the `repository-azure` plugin on all nodes with the following command:
+
+ ```bash
+ ./bin/opensearch-plugin install repository-azure
+ ```
+
+1. After the `repository-azure` plugin is installed, define your Azure Blob Storage settings before initializing the node. Start by defining your Azure Storage account name using the following secure setting:
+
+ ```bash
+ ./bin/opensearch-keystore add azure.client.default.account
+ ```
+
+Choose one of the following options for setting up your Azure Blob Storage authentication credentials.
+
+#### Using an Azure Storage account key
+
+Use the following setting to specify your Azure Storage account key:
+
+```bash
+./bin/opensearch-keystore add azure.client.default.key
+```
+
+#### Shared access signature
+
+Use the following setting when accessing Azure with a shared access signature (SAS):
+
+```bash
+./bin/opensearch-keystore add azure.client.default.sas_token
+```
+
+#### Azure token credential
+
+Starting in OpenSearch 2.15, you have the option to configure a token credential authentication flow in `opensearch.yml`. This method is distinct from connection string authentication, which requires a SAS or an account key.
+
+If you choose to use token credential authentication, you will need to choose a token credential type. Although Azure offers multiple token credential types, as of OpenSearch version 2.15, only [managed identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) is supported.
+
+To use managed identity, add your token credential type to `opensearch.yml` using either the `managed` or `managed_identity` value. This indicates that managed identity is being used to perform token credential authentication:
+
+```yml
+azure.client.default.token_credential_type: "managed_identity"
+```
+
+Note the following when using Azure token credentials:
+
+- Token credential support is disabled in `opensearch.yml` by default.
+- A token credential takes precedence over an Azure Storage account key or a SAS when multiple options are configured.
+
## Take snapshots
You specify two pieces of information when you create a snapshot:
diff --git a/images/dashboards/mds_feature_anywhere_create_alerting.gif b/images/dashboards/mds_feature_anywhere_create_alerting.gif
new file mode 100644
index 0000000000..712cace8bf
Binary files /dev/null and b/images/dashboards/mds_feature_anywhere_create_alerting.gif differ
diff --git a/images/dashboards/mds_feature_anywhere_view_alerting.gif b/images/dashboards/mds_feature_anywhere_view_alerting.gif
new file mode 100644
index 0000000000..ff840cfad4
Binary files /dev/null and b/images/dashboards/mds_feature_anywhere_view_alerting.gif differ
diff --git a/images/dashboards/mds_monitor_view.gif b/images/dashboards/mds_monitor_view.gif
new file mode 100644
index 0000000000..9ada1147f5
Binary files /dev/null and b/images/dashboards/mds_monitor_view.gif differ
diff --git a/images/dashboards/mds_sa_detection_rules_create.gif b/images/dashboards/mds_sa_detection_rules_create.gif
new file mode 100644
index 0000000000..50fc77b8d6
Binary files /dev/null and b/images/dashboards/mds_sa_detection_rules_create.gif differ
diff --git a/images/dashboards/mds_sa_detection_rules_view.gif b/images/dashboards/mds_sa_detection_rules_view.gif
new file mode 100644
index 0000000000..31508f10de
Binary files /dev/null and b/images/dashboards/mds_sa_detection_rules_view.gif differ