diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 0ec6c5e009..815687fa17 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -1 +1 @@ -* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh +* @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @stephen-crawford @epugh diff --git a/.github/ISSUE_TEMPLATE/issue_template.md b/.github/ISSUE_TEMPLATE/issue_template.md index 0985529940..b3e03c0ffe 100644 --- a/.github/ISSUE_TEMPLATE/issue_template.md +++ b/.github/ISSUE_TEMPLATE/issue_template.md @@ -15,7 +15,6 @@ assignees: '' **Tell us about your request.** Provide a summary of the request. -***Version:** List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all. +**Version:** List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all. **What other resources are available?** Provide links to related issues, POCs, steps for testing, etc. - diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 21b6fbfea6..fd4213b7e5 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -2,7 +2,7 @@ _Describe what this change achieves._ ### Issues Resolved -_List any issues this PR will resolve, e.g. Closes [...]._ +Closes #[_insert issue number_] ### Version _List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._ diff --git a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt index e33ac09744..bf4c8a0a29 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt @@ -85,6 +85,7 @@ Python PyTorch Querqy Query Workbench +RankLib RCF Summarize RPM Package Manager Ruby @@ -97,4 +98,5 @@ TorchScript Tribuo VisBuilder Winlogbeat -Zstandard \ No newline at end of file +XGBoost +Zstandard diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index 9e09f21c3a..0a1d2c557c 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -77,12 +77,16 @@ Levenshtein [Mm]ultivalued [Mm]ultiword [Nn]amespace -[Oo]versamples? +[Oo]ffline [Oo]nboarding +[Oo]versamples? pebibyte +p\d{2} [Pp]erformant [Pp]laintext [Pp]luggable +[Pp]reaggregate(s|d)? +[Pp]recompute(s|d)? [Pp]reconfigure [Pp]refetch [Pp]refilter @@ -101,8 +105,10 @@ pebibyte [Rr]eenable [Rr]eindex [Rr]eingest +[Rr]eprovision(ed|ing)? [Rr]erank(er|ed|ing)? [Rr]epo +[Rr]escor(e|ed|ing)? [Rr]ewriter [Rr]ollout [Rr]ollup @@ -126,6 +132,7 @@ stdout [Ss]ubvector [Ss]ubwords? [Ss]uperset +[Ss]uperadmins? [Ss]yslog tebibyte [Tt]emplated diff --git a/.gitignore b/.gitignore index da3cf9d144..92f01c5fca 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,4 @@ Gemfile.lock *.iml .jekyll-cache .project +vendor/bundle diff --git a/API_STYLE_GUIDE.md b/API_STYLE_GUIDE.md index a6e0551f17..a058bbe7c2 100644 --- a/API_STYLE_GUIDE.md +++ b/API_STYLE_GUIDE.md @@ -103,7 +103,7 @@ For GET and DELETE APIs: Introduce what you can do with the optional parameters. Parameter | Data type | Description :--- | :--- | :--- -### Request fields +### Request body fields For PUT and POST APIs: Introduce what the request fields are allowed to provide in the body of the request. @@ -189,7 +189,7 @@ The `POST _reindex` request returns the following response fields: } ``` -### Response fields +### Response body fields For PUT and POST APIs: Define all allowable response fields that can be returned in the body of the response. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 7afa9d7596..a3628b3b6e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -78,6 +78,8 @@ Follow these steps to set up your local copy of the repository: 1. Navigate to your cloned repository. +##### Building by using locally installed packages + 1. Install [Ruby](https://www.ruby-lang.org/en/) if you don't already have it. We recommend [RVM](https://rvm.io/), but you can use any method you prefer: ``` @@ -98,6 +100,14 @@ Follow these steps to set up your local copy of the repository: bundle install ``` +##### Building by using containerization + +Assuming you have `docker-compose` installed, run the following command: + + ``` + docker compose -f docker-compose.dev.yml up + ``` + #### Troubleshooting Try the following troubleshooting steps if you encounter an error when trying to build the documentation website: diff --git a/MAINTAINERS.md b/MAINTAINERS.md index 1bf2a1d219..55b908e027 100644 --- a/MAINTAINERS.md +++ b/MAINTAINERS.md @@ -6,12 +6,17 @@ This document lists the maintainers in this repo. See [opensearch-project/.githu | Maintainer | GitHub ID | Affiliation | | ---------------- | ----------------------------------------------- | ----------- | -| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon | | Fanit Kolchina | [kolchfa-aws](https://github.com/kolchfa-aws) | Amazon | | Nate Archer | [Naarcha-AWS](https://github.com/Naarcha-AWS) | Amazon | | Nathan Bower | [natebower](https://github.com/natebower) | Amazon | | Melissa Vagi | [vagimeli](https://github.com/vagimeli) | Amazon | | Miki Barahmand | [AMoo-Miki](https://github.com/AMoo-Miki) | Amazon | | David Venable | [dlvenable](https://github.com/dlvenable) | Amazon | -| Stephen Crawford | [scraw99](https://github.com/scrawfor99) | Amazon | +| Stephen Crawford | [stephen-crawford](https://github.com/stephen-crawford) | Amazon | | Eric Pugh | [epugh](https://github.com/epugh) | OpenSource Connections | + +## Emeritus + +| Maintainer | GitHub ID | Affiliation | +| ---------------- | ----------------------------------------------- | ----------- | +| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon | diff --git a/README.md b/README.md index 7d8de14151..66beb1948c 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,6 @@ The following resources provide important guidance regarding contributions to th If you encounter problems or have questions when contributing to the documentation, these people can help: -- [hdhalter](https://github.com/hdhalter) - [kolchfa-aws](https://github.com/kolchfa-aws) - [Naarcha-AWS](https://github.com/Naarcha-AWS) - [vagimeli](https://github.com/vagimeli) diff --git a/TERMS.md b/TERMS.md index 7de56f9275..146f3c1049 100644 --- a/TERMS.md +++ b/TERMS.md @@ -588,6 +588,8 @@ Use % in headlines, quotations, and tables or in technical copy. An agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM). +**performant** + **plaintext, plain text** Use *plaintext* only to refer to nonencrypted or decrypted text in content about encryption. Use *plain text* to refer to ASCII files. @@ -602,6 +604,10 @@ Tools inside of OpenSearch that can be customized to enhance OpenSearch's functi **pop-up** +**preaggregate** + +**precompute** + **premise, premises** With reference to property and buildings, always form as plural. diff --git a/_about/index.md b/_about/index.md index d2cc011b55..041197eeba 100644 --- a/_about/index.md +++ b/_about/index.md @@ -22,16 +22,21 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards. ## Getting started -- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/intro/) -- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) +To get started, explore the following documentation: + +- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/): + - [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/) + - [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/) + - [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/) + - [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) + - [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/) + - [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/) - [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) - [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/) -- [See the FAQ](https://opensearch.org/faq) +- [FAQ](https://opensearch.org/faq) ## Why use OpenSearch? -With OpenSearch, you can perform the following use cases: - @@ -41,35 +46,38 @@ With OpenSearch, you can perform the following use cases: - - - - + + + + - + - +
Operational health tracking
Fast, Scalable Full-text SearchApplication and Infrastructure MonitoringSecurity and Event Information ManagementOperational Health TrackingFast, scalable full-text searchApplication and infrastructure monitoringSecurity and event information managementOperational health tracking
Help users find the right information within your application, website, or data lake catalog. Easily store and analyze log data, and set automated alerts for underperformance.Easily store and analyze log data, and set automated alerts for performance issues. Centralize logs to enable real-time security monitoring and forensic analysis.Use observability logs, metrics, and traces to monitor your applications and business in real time.Use observability logs, metrics, and traces to monitor your applications in real time.
-**Additional features and plugins:** +## Key features + +OpenSearch provides several features to help index, secure, monitor, and analyze your data: -OpenSearch has several features and plugins to help index, secure, monitor, and analyze your data. Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface. -- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) - Identify atypical data and receive automatic notifications -- [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) - Find “nearest neighbors” in your vector data -- [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) - Monitor and optimize your cluster -- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) - Use SQL or a piped processing language to query your data -- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) - Automate index operations -- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) - Train and execute machine-learning models -- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) - Run search requests in the background -- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - Replicate your data across multiple OpenSearch clusters +- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications. +- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data. +- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations. +- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case. +- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads. +- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks. +- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster. +- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background. +- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters. ## The secure path forward -OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. + +OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/). ## Looking for the Javadoc? diff --git a/_about/version-history.md b/_about/version-history.md index d7273ffedb..b8b9e99309 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -9,6 +9,8 @@ permalink: /version-history/ OpenSearch version | Release highlights | Release date :--- | :--- | :--- +[2.17.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.1.md) | Includes bug fixes for ML Commons, anomaly detection, k-NN, and security analytics. Adds various infrastructure and maintenance updates. For a full list of release highlights, see the Release Notes. | 1 October 2024 +[2.17.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.0.md) | Includes disk-optimized vector search, binary quantization, and byte vector encoding in k-NN. Adds asynchronous batch ingestion for ML tasks. Provides search and query performance enhancements and a new custom trace source in trace analytics. Includes application-based configuration templates. For a full list of release highlights, see the Release Notes. | 17 September 2024 [2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024 [2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024 [2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024 @@ -31,6 +33,7 @@ OpenSearch version | Release highlights | Release date [2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022 [2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022 [2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022 +[1.3.19](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.19.md) | Includes bug fixes and maintenance updates for OpenSearch security, OpenSearch security Dashboards, and anomaly detection. | 27 August 2024 [1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024 [1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024 [1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024 diff --git a/_aggregations/bucket/auto-interval-date-histogram.md b/_aggregations/bucket/auto-interval-date-histogram.md new file mode 100644 index 0000000000..b7a95a3b89 --- /dev/null +++ b/_aggregations/bucket/auto-interval-date-histogram.md @@ -0,0 +1,377 @@ +--- +layout: default +title: Auto-interval date histogram +parent: Bucket aggregations +grand_parent: Aggregations +nav_order: 12 +--- + +# Auto-interval date histogram + +Similar to the [date histogram aggregation]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/), in which you must specify an interval, the `auto_date_histogram` is a multi-bucket aggregation that automatically creates date histogram buckets based on the number of buckets you provide and the time range of your data. The actual number of buckets returned is always less than or equal to the number of buckets you specify. This aggregation is particularly useful when you are working with time-series data and want to visualize or analyze data over different time intervals without manually specifying the interval size. + +## Intervals + +The bucket interval is chosen based on the collected data to ensure that the number of returned buckets is less than or equal to the requested number. + +The following table lists the possible returned intervals for each time unit. + +| Unit | Intervals | +| :--- | :---| +| Seconds| Multiples of 1, 5, 10, and 30 | +| Minutes| Multiples of 1, 5, 10, and 30 | +| Hours | Multiples of 1, 3, and 12 | +| Days | Multiples of 1 and 7 | +| Months | Multiples of 1 and 3 | +| Years | Multiples of 1, 5, 10, 20, 50, and 100 | + +If an aggregation returns too many buckets (for example, daily buckets), OpenSearch will automatically reduce the number of buckets to ensure a manageable result. Instead of returning the exact number of requested daily buckets, it will reduce them by a factor of about 1/7. For example, if you ask for 70 buckets but the data contains too many daily intervals, OpenSearch might return only 10 buckets, grouping the data into larger intervals (such as weeks) to avoid an overwhelming number of results. This helps optimize the aggregation and prevent excessive detail when too much data is available. + +## Example + +In the following example, you'll search an index containing blog posts. + +First, create a mapping for this index and specify the `date_posted` field as the `date` type: + +```json +PUT blogs +{ + "mappings" : { + "properties" : { + "date_posted" : { + "type" : "date", + "format" : "yyyy-MM-dd" + } + } + } +} +``` +{% include copy-curl.html %} + +Next, index the following documents into the `blogs` index: + +```json +PUT blogs/_doc/1 +{ + "name": "Semantic search in OpenSearch", + "date_posted": "2022-04-17" +} +``` +{% include copy-curl.html %} + +```json +PUT blogs/_doc/2 +{ + "name": "Sparse search in OpenSearch", + "date_posted": "2022-05-02" +} +``` +{% include copy-curl.html %} + +```json +PUT blogs/_doc/3 +{ + "name": "Distributed tracing with Data Prepper", + "date_posted": "2022-04-25" +} +``` +{% include copy-curl.html %} + +```json +PUT blogs/_doc/4 +{ + "name": "Observability in OpenSearch", + "date_posted": "2023-03-23" +} + +``` +{% include copy-curl.html %} + +To use the `auto_date_histogram` aggregation, specify the field containing the date or timestamp values. For example, to aggregate blog posts by `date_posted` into two buckets, send the following request: + +```json +GET /blogs/_search +{ + "size": 0, + "aggs": { + "histogram": { + "auto_date_histogram": { + "field": "date_posted", + "buckets": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +The response shows that the blog posts were aggregated into two buckets. The interval was automatically set to 1 year, with all three 2022 blog posts collected in one bucket and the 2023 blog post in another: + +```json +{ + "took": 20, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "histogram": { + "buckets": [ + { + "key_as_string": "2022-01-01", + "key": 1640995200000, + "doc_count": 3 + }, + { + "key_as_string": "2023-01-01", + "key": 1672531200000, + "doc_count": 1 + } + ], + "interval": "1y" + } + } +} +``` + +## Returned buckets + +Each bucket contains the following information: + +```json +{ + "key_as_string": "2023-01-01", + "key": 1672531200000, + "doc_count": 1 +} +``` + +In OpenSearch, dates are internally stored as 64-bit integers representing timestamps in milliseconds since the epoch. In the aggregation response, each bucket `key` is returned as such a timestamp. The `key_as_string` value shows the same timestamp but formatted as a date string based on the [`format`](#date-format) parameter. The `doc_count` field contains the number of documents in the bucket. + +## Parameters + +Auto-interval date histogram aggregations accept the following parameters. + +Parameter | Data type | Description +:--- | :--- | :--- +`field` | String | The field on which to aggregate. The field must contain the date or timestamp values. Either `field` or `script` is required. +`buckets` | Integer | The desired number of buckets. The returned number of buckets is less than or equal to the desired number. Optional. Default is `10`. +`minimum_interval` | String | The minimum interval to be used. Specifying a minimum interval can make the aggregation process more efficient. Valid values are `year`, `month`, `day`, `hour`, `minute`, and `second`. Optional. +`time_zone` | String | Specifies to use a time zone other than the default (UTC) for bucketing and rounding. You can specify the `time_zone` parameter as a [UTC offset](https://en.wikipedia.org/wiki/UTC_offset), such as `-04:00`, or an [IANA time zone ID](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), such as `America/New_York`. Optional. Default is `UTC`. For more information, see [Time zone](#time-zone). +`format` | String | The format for returning dates representing bucket keys. Optional. Default is the format specified in the field mapping. For more information, see [Date format](#date-format). +`script` | String | A document-level or value-level script for aggregating values into buckets. Either `field` or `script` is required. +`missing` | String | Specifies how to handle documents in which the field value is missing. By default, such documents are ignored. If you specify a date value in the `missing` parameter, all documents in which the field value is missing are collected into the bucket with the specified date. + +## Date format + +If you don't specify the `format` parameter, the format defined in the field mapping is used (as seen in the preceding response). To modify the format, specify the `format` parameter: + +```json +GET /blogs/_search +{ + "size": 0, + "aggs": { + "histogram": { + "auto_date_histogram": { + "field": "date_posted", + "format": "yyyy-MM-dd HH:mm:ss" + } + } + } +} +``` +{% include copy-curl.html %} + +The `key_as_string` field is now returned in the specified format: + +```json +{ + "key_as_string": "2023-01-01 00:00:00", + "key": 1672531200000, + "doc_count": 1 +} +``` + +Alternatively, you can specify one of the built-in date [formats]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats): + +```json +GET /blogs/_search +{ + "size": 0, + "aggs": { + "histogram": { + "auto_date_histogram": { + "field": "date_posted", + "format": "basic_date_time_no_millis" + } + } + } +} +``` +{% include copy-curl.html %} + +The `key_as_string` field is now returned in the specified format: + +```json +{ + "key_as_string": "20230101T000000Z", + "key": 1672531200000, + "doc_count": 1 +} +``` + +## Time zone + +By default, dates are stored and processed in UTC. The `time_zone` parameter allows you to specify a different time zone for bucketing. You can specify the `time_zone` parameter as a [UTC offset](https://en.wikipedia.org/wiki/UTC_offset), such as `-04:00`, or an [IANA time zone ID](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), such as `America/New_York`. + +As an example, index the following documents into an index: + +```json +PUT blogs1/_doc/1 +{ + "name": "Semantic search in OpenSearch", + "date_posted": "2022-04-17T01:00:00.000Z" +} +``` +{% include copy-curl.html %} + +```json +PUT blogs1/_doc/2 +{ + "name": "Sparse search in OpenSearch", + "date_posted": "2022-04-17T04:00:00.000Z" +} +``` +{% include copy-curl.html %} + +First, run an aggregation without specifying a time zone: + +```json +GET /blogs1/_search +{ + "size": 0, + "aggs": { + "histogram": { + "auto_date_histogram": { + "field": "date_posted", + "buckets": 2, + "format": "yyyy-MM-dd HH:mm:ss" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains two 3-hour buckets, starting at midnight UTC on April 17, 2022: + +```json +{ + "took": 6, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "histogram": { + "buckets": [ + { + "key_as_string": "2022-04-17 01:00:00", + "key": 1650157200000, + "doc_count": 1 + }, + { + "key_as_string": "2022-04-17 04:00:00", + "key": 1650168000000, + "doc_count": 1 + } + ], + "interval": "3h" + } + } +} +``` + +Now, specify a `time_zone` of `-02:00`: + +```json +GET /blogs1/_search +{ + "size": 0, + "aggs": { + "histogram": { + "auto_date_histogram": { + "field": "date_posted", + "buckets": 2, + "format": "yyyy-MM-dd HH:mm:ss", + "time_zone": "-02:00" + } + } + } +} +``` + +The response contains two buckets in which the start time is shifted by 2 hours and starts at 23:00 on April 16, 2022: + +```json +{ + "took": 17, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "histogram": { + "buckets": [ + { + "key_as_string": "2022-04-16 23:00:00", + "key": 1650157200000, + "doc_count": 1 + }, + { + "key_as_string": "2022-04-17 02:00:00", + "key": 1650168000000, + "doc_count": 1 + } + ], + "interval": "3h" + } + } +} +``` + +When using time zones with daylight saving time (DST) changes, the sizes of buckets that are near the transition may differ slightly from the sizes of neighboring buckets. +{: .note} diff --git a/_aggregations/bucket/children.md b/_aggregations/bucket/children.md new file mode 100644 index 0000000000..1f493c4620 --- /dev/null +++ b/_aggregations/bucket/children.md @@ -0,0 +1,172 @@ +--- +layout: default +title: Children +parent: Bucket aggregations +grand_parent: Aggregations +nav_order: 15 +--- + +# Children + +The `children` aggregation connects parent documents with their related child documents. This allows you to analyze relationships between different types of data in a single query, rather than needing to run multiple queries and combine the results manually. + +--- + +## Example index, sample data, and children aggregation query + +For example, if you have a parent-child relationship between authors, posts, and comments, you can analyze the relationships between the different data types (`authors`, `posts`, and `comments`) in a single query. + +The `authors` aggregation groups the documents by the `author.keyword` field. This allows you to see the number of documents associates with each author. + +In each author group, the `children` aggregation retrieves the associated posts. This gives you a breakdown of the posts written by each author. + +In the `posts` aggregation, another `children` aggregation fetches the comments associated with each post. This provides you a way to see the comments for each individual post. + +In the `comments` aggregation, the `value_count` aggregation counts the number of comments on each post. This allows you to gauge the engagement level for each post by seeing the number of comments it has received. + +#### Example index + +```json +PUT /blog-sample +{ + "mappings": { + "properties": { + "type": { "type": "keyword" }, + "name": { "type": "keyword" }, + "title": { "type": "text" }, + "content": { "type": "text" }, + "author": { "type": "keyword" }, + "post_id": { "type": "keyword" }, + "join_field": { + "type": "join", + "relations": { + "author": "post", + "post": "comment" + } + } + } + } +} +``` +{% include copy-curl.html %} + +#### Sample documents + +```json +POST /blog-sample/_doc/1?routing=1 +{ + "type": "author", + "name": "John Doe", + "join_field": "author" +} + +POST /blog-sample/_doc/2?routing=1 +{ + "type": "post", + "title": "Introduction to OpenSearch", + "content": "OpenSearch is a powerful search and analytics engine...", + "author": "John Doe", + "join_field": { + "name": "post", + "parent": "1" + } +} + +POST /blog-sample/_doc/3?routing=1 +{ + "type": "comment", + "content": "Great article! Very informative.", + "join_field": { + "name": "comment", + "parent": "2" + } +} + +POST /blog-sample/_doc/4?routing=1 +{ + "type": "comment", + "content": "Thanks for the clear explanation.", + "join_field": { + "name": "comment", + "parent": "2" + } +} +``` +{% include copy-curl.html %} + +#### Example children aggregation query + +```json +GET /blog-sample/_search +{ + "size": 0, + "aggs": { + "authors": { + "terms": { + "field": "name.keyword" + }, + "aggs": { + "posts": { + "children": { + "type": "post" + }, + "aggs": { + "post_titles": { + "terms": { + "field": "title.keyword" + }, + "aggs": { + "comments": { + "children": { + "type": "comment" + }, + "aggs": { + "comment_count": { + "value_count": { + "field": "_id" + } + } + } + } + } + } + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +#### Example response + +The response should appear similar to the following example: + +```json +{ + "took": 30, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "authors": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [] + } + } +} +``` diff --git a/_aggregations/bucket/index.md b/_aggregations/bucket/index.md index 1658c06ea5..e1a02b890d 100644 --- a/_aggregations/bucket/index.md +++ b/_aggregations/bucket/index.md @@ -22,6 +22,7 @@ You can use bucket aggregations to implement faceted navigation (usually placed OpenSearch supports the following bucket aggregations: - [Adjacency matrix]({{site.url}}{{site.baseurl}}/aggregations/bucket/adjacency-matrix/) +- [Children]({{site.url}}{{site.baseurl}}/aggregations/bucket/children) - [Date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/) - [Date range]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-range/) - [Diversified sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/diversified-sampler/) diff --git a/_analyzers/character-filters/html-character-filter.md b/_analyzers/character-filters/html-character-filter.md new file mode 100644 index 0000000000..9fb98d9744 --- /dev/null +++ b/_analyzers/character-filters/html-character-filter.md @@ -0,0 +1,124 @@ +--- +layout: default +title: html_strip character filter +parent: Character filters +nav_order: 100 +--- + +# `html_strip` character filter + +The `html_strip` character filter removes HTML tags, such as `
`, `

`, and ``, from the input text and renders plain text. The filter can be configured to preserve certain tags or decode specific HTML entities, such as ` `, into spaces. + +## Example: HTML analyzer + +```json +GET /_analyze +{ + "tokenizer": "keyword", + "char_filter": [ + "html_strip" + ], + "text": "

Commonly used calculus symbols include α, β and θ

" +} +``` +{% include copy-curl.html %} + +Using the HTML analyzer, you can convert the HTML character entity references into their corresponding symbols. The processed text would read as follows: + +``` +Commonly used calculus symbols include α, β and θ +``` + +## Example: Custom analyzer with lowercase filter + +The following example query creates a custom analyzer that strips HTML tags and converts the plain text to lowercase by using the `html_strip` analyzer and `lowercase` filter: + +```json +PUT /html_strip_and_lowercase_analyzer +{ + "settings": { + "analysis": { + "char_filter": { + "html_filter": { + "type": "html_strip" + } + }, + "analyzer": { + "html_strip_analyzer": { + "type": "custom", + "char_filter": ["html_filter"], + "tokenizer": "standard", + "filter": ["lowercase"] + } + } + } + } +} +``` +{% include copy-curl.html %} + +### Testing `html_strip_and_lowercase_analyzer` + +You can run the following request to test the analyzer: + +```json +GET /html_strip_and_lowercase_analyzer/_analyze +{ + "analyzer": "html_strip_analyzer", + "text": "

Welcome to OpenSearch!

" +} +``` +{% include copy-curl.html %} + +In the response, the HTML tags have been removed and the plain text has been converted to lowercase: + +``` +welcome to opensearch! +``` + +## Example: Custom analyzer that preserves HTML tags + +The following example request creates a custom analyzer that preserves HTML tags: + +```json +PUT /html_strip_preserve_analyzer +{ + "settings": { + "analysis": { + "char_filter": { + "html_filter": { + "type": "html_strip", + "escaped_tags": ["b", "i"] + } + }, + "analyzer": { + "html_strip_analyzer": { + "type": "custom", + "char_filter": ["html_filter"], + "tokenizer": "keyword" + } + } + } + } +} +``` +{% include copy-curl.html %} + +### Testing `html_strip_preserve_analyzer` + +You can run the following request to test the analyzer: + +```json +GET /html_strip_preserve_analyzer/_analyze +{ + "analyzer": "html_strip_analyzer", + "text": "

This is a bold and italic text.

" +} +``` +{% include copy-curl.html %} + +In the response, the `italic` and `bold` tags have been retained, as specified in the custom analyzer request: + +``` +This is a bold and italic text. +``` diff --git a/_analyzers/character-filters/index.md b/_analyzers/character-filters/index.md new file mode 100644 index 0000000000..0e2ce01b8c --- /dev/null +++ b/_analyzers/character-filters/index.md @@ -0,0 +1,19 @@ +--- +layout: default +title: Character filters +nav_order: 90 +has_children: true +has_toc: false +--- + +# Character filters + +Character filters process text before tokenization to prepare it for further analysis. + +Unlike token filters, which operate on tokens (words or terms), character filters process the raw input text before tokenization. They are especially useful for cleaning or transforming structured text containing unwanted characters, such as HTML tags or special symbols. Character filters help to strip or replace these elements so that text is properly formatted for analysis. + +Use cases for character filters include: + +- **HTML stripping:** Removes HTML tags from content so that only the plain text is indexed. +- **Pattern replacement:** Replaces or removes unwanted characters or patterns in text, for example, converting hyphens to spaces. +- **Custom mappings:** Substitutes specific characters or sequences with other values, for example, to convert currency symbols into their textual equivalents. diff --git a/_analyzers/index-analyzers.md b/_analyzers/index-analyzers.md index 72332758d0..3c40755502 100644 --- a/_analyzers/index-analyzers.md +++ b/_analyzers/index-analyzers.md @@ -2,6 +2,7 @@ layout: default title: Index analyzers nav_order: 20 +parent: Analyzers --- # Index analyzers diff --git a/_analyzers/index.md b/_analyzers/index.md index 95f97ec8ce..def6563f3e 100644 --- a/_analyzers/index.md +++ b/_analyzers/index.md @@ -45,20 +45,9 @@ An analyzer must contain exactly one tokenizer and may contain zero or more char There is also a special type of analyzer called a ***normalizer***. A normalizer is similar to an analyzer except that it does not contain a tokenizer and can only include specific types of character filters and token filters. These filters can perform only character-level operations, such as character or pattern replacement, and cannot perform operations on the token as a whole. This means that replacing a token with a synonym or stemming is not supported. See [Normalizers]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) for further details. -## Built-in analyzers +## Supported analyzers -The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`. - -Analyzer | Analysis performed | Analyzer output -:--- | :--- | :--- -**Standard** (default) | - Parses strings into tokens at word boundaries
- Removes most punctuation
- Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`] -**Simple** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`] -**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`] -**Stop** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Removes stop words
- Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`] -**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`] -**Pattern** | - Parses strings into tokens using regular expressions
- Supports converting strings to lowercase
- Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`] -[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`] -**Fingerprint** | - Parses strings on any non-letter character
- Normalizes characters by converting them to ASCII
- Converts tokens to lowercase
- Sorts, deduplicates, and concatenates tokens into a single token
- Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`]
Note that the apostrophe was converted to its ASCII counterpart. +For a list of supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/). ## Custom analyzers @@ -170,6 +159,31 @@ The response provides information about the analyzers for each field: } ``` +## Normalizers + +Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical. + +### Normalization techniques + +The following normalization techniques can help address variations in token forms: + +1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello". + +2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car" and "running" is normalized to "run". + +3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run". + +### Normalization + +A search for `Hello` will match documents containing `hello` because of case normalization. + +A search for `cars` will also match documents containing `car` because of stemming. + +A query for `running` can retrieve documents containing `jogging` using synonym handling. + +Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`. + ## Next steps -- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). \ No newline at end of file +- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). +- See the list of [supported analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/). \ No newline at end of file diff --git a/_analyzers/language-analyzers.md b/_analyzers/language-analyzers.md index f5a2f18cb3..ca4ba320dd 100644 --- a/_analyzers/language-analyzers.md +++ b/_analyzers/language-analyzers.md @@ -1,14 +1,15 @@ --- layout: default title: Language analyzers -nav_order: 10 +nav_order: 100 +parent: Analyzers redirect_from: - /query-dsl/analyzers/language-analyzers/ --- -# Language analyzer +# Language analyzers -OpenSearch supports the following language values with the `analyzer` option: +OpenSearch supports the following language analyzers: `arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `estonian`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `lithuanian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `turkish`, and `thai`. To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field: @@ -40,4 +41,4 @@ PUT my-index } ``` - \ No newline at end of file + diff --git a/_analyzers/search-analyzers.md b/_analyzers/search-analyzers.md index b47e739d28..52159edb70 100644 --- a/_analyzers/search-analyzers.md +++ b/_analyzers/search-analyzers.md @@ -2,6 +2,7 @@ layout: default title: Search analyzers nav_order: 30 +parent: Analyzers --- # Search analyzers @@ -42,7 +43,7 @@ GET shakespeare/_search ``` {% include copy-curl.html %} -Valid values for [built-in analyzers]({{site.url}}{{site.baseurl}}/analyzers/index#built-in-analyzers) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/). +For more information about supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/). ## Specifying a search analyzer for a field diff --git a/_analyzers/supported-analyzers/index.md b/_analyzers/supported-analyzers/index.md new file mode 100644 index 0000000000..5616936179 --- /dev/null +++ b/_analyzers/supported-analyzers/index.md @@ -0,0 +1,41 @@ +--- +layout: default +title: Analyzers +nav_order: 40 +has_children: true +has_toc: false +redirect_from: + - /analyzers/supported-analyzers/index/ +--- + +# Analyzers + +The following sections list all analyzers that OpenSearch supports. + +## Built-in analyzers + +The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`. + +Analyzer | Analysis performed | Analyzer output +:--- | :--- | :--- +**Standard** (default) | - Parses strings into tokens at word boundaries
- Removes most punctuation
- Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`] +**Simple** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`] +**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`] +**Stop** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Removes stop words
- Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`] +**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`] +**Pattern** | - Parses strings into tokens using regular expressions
- Supports converting strings to lowercase
- Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`] +[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`] +**Fingerprint** | - Parses strings on any non-letter character
- Normalizes characters by converting them to ASCII
- Converts tokens to lowercase
- Sorts, deduplicates, and concatenates tokens into a single token
- Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`]
Note that the apostrophe was converted to its ASCII counterpart. + +## Language analyzers + +OpenSearch supports multiple language analyzers. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/). + +## Additional analyzers + +The following table lists the additional analyzers that OpenSearch supports. + +| Analyzer | Analysis performed | +|:---------------|:---------------------------------------------------------------------------------------------------------| +| `phone` | An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) for parsing phone numbers. | +| `phone-search` | A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) for parsing phone numbers. | diff --git a/_analyzers/supported-analyzers/phone-analyzers.md b/_analyzers/supported-analyzers/phone-analyzers.md new file mode 100644 index 0000000000..f24b7cf328 --- /dev/null +++ b/_analyzers/supported-analyzers/phone-analyzers.md @@ -0,0 +1,128 @@ +--- +layout: default +title: Phone number +parent: Analyzers +nav_order: 140 +--- + +# Phone number analyzers + +The `analysis-phonenumber` plugin provides analyzers and tokenizers for parsing phone numbers. A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common misconceptions regarding phone number parsing, see [Falsehoods programmers believe about phone numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md). + + +OpenSearch supports the following phone number analyzers: + +* [`phone`](#the-phone-analyzer): An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to use at indexing time. +* [`phone-search`](#the-phone-search-analyzer): A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) to use at search time. + +Internally, the plugin uses the [`libphonenumber`](https://github.com/google/libphonenumber) library and follows its parsing rules. + +The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields that only contain phone numbers. +{: .note} + +## Installing the plugin + +Before you can use the phone number analyzers, you must install the `analysis-phonenumber` plugin by running the following command: + +```sh +./bin/opensearch-plugin install analysis-phonenumber +``` + +## Specifying a default region + +You can optionally specify a default region for parsing phone numbers by providing the `phone-region` parameter within the analyzer. Valid phone regions are represented by ISO 3166 country codes. For more information, see [List of ISO 3166 country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes). + +When tokenizing phone numbers containing the international calling prefix `+`, the default region is irrelevant. However, for phone numbers that use a national prefix for international numbers (for example, `001` instead of `+1` to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix by specifying the region. + +## Example + +The following request creates an index containing one field that ingests phone numbers for Switzerland (region code `CH`): + +```json +PUT /example-phone +{ + "settings": { + "analysis": { + "analyzer": { + "phone-ch": { + "type": "phone", + "phone-region": "CH" + }, + "phone-search-ch": { + "type": "phone-search", + "phone-region": "CH" + } + } + } + }, + "mappings": { + "properties": { + "phone_number": { + "type": "text", + "analyzer": "phone-ch", + "search_analyzer": "phone-search-ch" + } + } + } +} +``` +{% include copy-curl.html %} + +## The phone analyzer + +The `phone` analyzer generates n-grams based on the given phone number. A (fictional) Swiss phone number containing an international calling prefix can be parsed with or without the Swiss-specific phone region. Thus, the following two requests will produce the same result: + +```json +GET /example-phone/_analyze +{ + "analyzer" : "phone-ch", + "text" : "+41 60 555 12 34" +} +``` +{% include copy-curl.html %} + +```json +GET /example-phone/_analyze +{ + "analyzer" : "phone", + "text" : "+41 60 555 12 34" +} +``` +{% include copy-curl.html %} + +The response contains the generated n-grams: + +```json +["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...] +``` + +However, if you specify the phone number without the international calling prefix `+` (either by using `0041` or omitting +the international calling prefix altogether), then only the analyzer configured with the correct phone region can parse the number: + +```json +GET /example-phone/_analyze +{ + "analyzer" : "phone-ch", + "text" : "060 555 12 34" +} +``` +{% include copy-curl.html %} + +## The phone-search analyzer + +In contrast, the `phone-search` analyzer does not create n-grams and only issues some basic tokens. For example, send the following request and specify the `phone-search` analyzer: + +```json +GET /example-phone/_analyze +{ + "analyzer" : "phone-search", + "text" : "+41 60 555 12 34" +} +``` +{% include copy-curl.html %} + +The response contains the following tokens: + +```json +["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"] +``` diff --git a/_analyzers/token-filters/asciifolding.md b/_analyzers/token-filters/asciifolding.md new file mode 100644 index 0000000000..d572251988 --- /dev/null +++ b/_analyzers/token-filters/asciifolding.md @@ -0,0 +1,135 @@ +--- +layout: default +title: ASCII folding +parent: Token filters +nav_order: 20 +--- + +# ASCII folding token filter + +The `asciifolding` token filter converts non-ASCII characters to their closest ASCII equivalents. For example, *é* becomes *e*, *ü* becomes *u*, and *ñ* becomes *n*. This process is known as *transliteration*. + + +The `asciifolding` token filter offers a number of benefits: + + - **Enhanced search flexibility**: Users often omit accents or special characters when entering queries. The `asciifolding` token filter ensures that such queries still return relevant results. + - **Normalization**: Standardizes the indexing process by ensuring that accented characters are consistently converted to their ASCII equivalents. + - **Internationalization**: Particularly useful for applications including multiple languages and character sets. + +While the `asciifolding` token filter can simplify searches, it may also lead to the loss of specific information, particularly if the distinction between accented and non-accented characters in the dataset is significant. +{: .warning} + +## Parameters + +You can configure the `asciifolding` token filter using the `preserve_original` parameter. Setting this parameter to `true` keeps both the original token and its ASCII-folded version in the token stream. This can be particularly useful when you want to match both the original (with accents) and the normalized (without accents) versions of a term in a search query. Default is `false`. + +## Example + +The following example request creates a new index named `example_index` and defines an analyzer with the `asciifolding` filter and `preserve_original` parameter set to `true`: + +```json +PUT /example_index +{ + "settings": { + "analysis": { + "filter": { + "custom_ascii_folding": { + "type": "asciifolding", + "preserve_original": true + } + }, + "analyzer": { + "custom_ascii_analyzer": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "custom_ascii_folding" + ] + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the analyzer: + +```json +POST /example_index/_analyze +{ + "analyzer": "custom_ascii_analyzer", + "text": "Résumé café naïve coördinate" +} +``` +{% include copy-curl.html %} + +The response contains the generated tokens: + +```json +{ + "tokens": [ + { + "token": "resume", + "start_offset": 0, + "end_offset": 6, + "type": "", + "position": 0 + }, + { + "token": "résumé", + "start_offset": 0, + "end_offset": 6, + "type": "", + "position": 0 + }, + { + "token": "cafe", + "start_offset": 7, + "end_offset": 11, + "type": "", + "position": 1 + }, + { + "token": "café", + "start_offset": 7, + "end_offset": 11, + "type": "", + "position": 1 + }, + { + "token": "naive", + "start_offset": 12, + "end_offset": 17, + "type": "", + "position": 2 + }, + { + "token": "naïve", + "start_offset": 12, + "end_offset": 17, + "type": "", + "position": 2 + }, + { + "token": "coordinate", + "start_offset": 18, + "end_offset": 28, + "type": "", + "position": 3 + }, + { + "token": "coördinate", + "start_offset": 18, + "end_offset": 28, + "type": "", + "position": 3 + } + ] +} +``` + + diff --git a/_analyzers/token-filters/cjk-bigram.md b/_analyzers/token-filters/cjk-bigram.md new file mode 100644 index 0000000000..ab21549c47 --- /dev/null +++ b/_analyzers/token-filters/cjk-bigram.md @@ -0,0 +1,160 @@ +--- +layout: default +title: CJK bigram +parent: Token filters +nav_order: 30 +--- + +# CJK bigram token filter + +The `cjk_bigram` token filter is designed specifically for processing East Asian languages, such as Chinese, Japanese, and Korean (CJK), which typically don't use spaces to separate words. A bigram is a sequence of two adjacent elements in a string of tokens, which can be characters or words. For CJK languages, bigrams help approximate word boundaries and capture significant character pairs that can convey meaning. + + +## Parameters + +The `cjk_bigram` token filter can be configured with two parameters: `ignore_scripts`and `output_unigrams`. + +### `ignore_scripts` + +The `cjk-bigram` token filter ignores all non-CJK scripts (writing systems like Latin or Cyrillic) and tokenizes only CJK text into bigrams. Use this option to specify CJK scripts to be ignored. This option takes the following valid values: + +- `han`: The `han` script processes Han characters. [Han characters](https://simple.wikipedia.org/wiki/Chinese_characters) are logograms used in the written languages of China, Japan, and Korea. The filter can help with text processing tasks like tokenizing, normalizing, or stemming text written in Chinese, Japanese kanji, or Korean Hanja. + +- `hangul`: The `hangul` script processes Hangul characters, which are unique to the Korean language and do not exist in other East Asian scripts. + +- `hiragana`: The `hiragana` script processes hiragana, one of the two syllabaries used in the Japanese writing system. + Hiragana is typically used for native Japanese words, grammatical elements, and certain forms of punctuation. + +- `katakana`: The `katakana` script processes katakana, the other Japanese syllabary. + Katakana is mainly used for foreign loanwords, onomatopoeia, scientific names, and certain Japanese words. + + +### `output_unigrams` + +This option, when set to `true`, outputs both unigrams (single characters) and bigrams. Default is `false`. + +## Example + +The following example request creates a new index named `devanagari_example_index` and defines an analyzer with the `cjk_bigram_filter` filter and `ignored_scripts` parameter set to `katakana`: + +```json +PUT /cjk_bigram_example +{ + "settings": { + "analysis": { + "analyzer": { + "cjk_bigrams_no_katakana": { + "tokenizer": "standard", + "filter": [ "cjk_bigrams_no_katakana_filter" ] + } + }, + "filter": { + "cjk_bigrams_no_katakana_filter": { + "type": "cjk_bigram", + "ignored_scripts": [ + "katakana" + ], + "output_unigrams": true + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the analyzer: + +```json +POST /cjk_bigram_example/_analyze +{ + "analyzer": "cjk_bigrams_no_katakana", + "text": "東京タワーに行く" +} +``` +{% include copy-curl.html %} + +Sample text: "東京タワーに行く" + + 東京 (Kanji for "Tokyo") + タワー (Katakana for "Tower") + に行く (Hiragana and Kanji for "go to") + +The response contains the generated tokens: + +```json +{ + "tokens": [ + { + "token": "東", + "start_offset": 0, + "end_offset": 1, + "type": "", + "position": 0 + }, + { + "token": "東京", + "start_offset": 0, + "end_offset": 2, + "type": "", + "position": 0, + "positionLength": 2 + }, + { + "token": "京", + "start_offset": 1, + "end_offset": 2, + "type": "", + "position": 1 + }, + { + "token": "タワー", + "start_offset": 2, + "end_offset": 5, + "type": "", + "position": 2 + }, + { + "token": "に", + "start_offset": 5, + "end_offset": 6, + "type": "", + "position": 3 + }, + { + "token": "に行", + "start_offset": 5, + "end_offset": 7, + "type": "", + "position": 3, + "positionLength": 2 + }, + { + "token": "行", + "start_offset": 6, + "end_offset": 7, + "type": "", + "position": 4 + }, + { + "token": "行く", + "start_offset": 6, + "end_offset": 8, + "type": "", + "position": 4, + "positionLength": 2 + }, + { + "token": "く", + "start_offset": 7, + "end_offset": 8, + "type": "", + "position": 5 + } + ] +} +``` + + diff --git a/_analyzers/token-filters/cjk-width.md b/_analyzers/token-filters/cjk-width.md new file mode 100644 index 0000000000..4960729cd1 --- /dev/null +++ b/_analyzers/token-filters/cjk-width.md @@ -0,0 +1,96 @@ +--- +layout: default +title: CJK width +parent: Token filters +nav_order: 40 +--- + +# CJK width token filter + +The `cjk_width` token filter normalizes Chinese, Japanese, and Korean (CJK) tokens by converting full-width ASCII characters to their standard (half-width) ASCII equivalents and half-width katakana characters to their full-width equivalents. + +### Converting full-width ASCII characters + +In CJK texts, ASCII characters (such as letters and numbers) can appear in full-width form, occupying the space of two half-width characters. Full-width ASCII characters are typically used in East Asian typography for alignment with the width of CJK characters. However, for the purposes of indexing and searching, these full-width characters need to be normalized to their standard (half-width) ASCII equivalents. + +The following example illustrates ASCII character normalization: + +``` + Full-Width: ABCDE 12345 + Normalized (half-width): ABCDE 12345 +``` + +### Converting half-width katakana characters + +The `cjk_width` token filter converts half-width katakana characters to their full-width counterparts, which are the standard form used in Japanese text. This normalization, illustrated in the following example, is important for consistency in text processing and searching: + + +``` + Half-Width katakana: カタカナ + Normalized (full-width) katakana: カタカナ +``` + +## Example + +The following example request creates a new index named `cjk_width_example_index` and defines an analyzer with the `cjk_width` filter: + +```json +PUT /cjk_width_example_index +{ + "settings": { + "analysis": { + "analyzer": { + "cjk_width_analyzer": { + "type": "custom", + "tokenizer": "standard", + "filter": ["cjk_width"] + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the analyzer: + +```json +POST /cjk_width_example_index/_analyze +{ + "analyzer": "cjk_width_analyzer", + "text": "Tokyo 2024 カタカナ" +} +``` +{% include copy-curl.html %} + +The response contains the generated tokens: + +```json +{ + "tokens": [ + { + "token": "Tokyo", + "start_offset": 0, + "end_offset": 5, + "type": "", + "position": 0 + }, + { + "token": "2024", + "start_offset": 6, + "end_offset": 10, + "type": "", + "position": 1 + }, + { + "token": "カタカナ", + "start_offset": 11, + "end_offset": 15, + "type": "", + "position": 2 + } + ] +} +``` diff --git a/_analyzers/token-filters/classic.md b/_analyzers/token-filters/classic.md new file mode 100644 index 0000000000..34db74a824 --- /dev/null +++ b/_analyzers/token-filters/classic.md @@ -0,0 +1,93 @@ +--- +layout: default +title: Classic +parent: Token filters +nav_order: 50 +--- + +# Classic token filter + +The primary function of the classic token filter is to work alongside the classic tokenizer. It processes tokens by applying the following common transformations, which aid in text analysis and search: + - Removal of possessive endings such as *'s*. For example, *John's* becomes *John*. + - Removal of periods from acronyms. For example, *D.A.R.P.A.* becomes *DARPA*. + + +## Example + +The following example request creates a new index named `custom_classic_filter` and configures an analyzer with the `classic` filter: + +```json +PUT /custom_classic_filter +{ + "settings": { + "analysis": { + "analyzer": { + "custom_classic": { + "type": "custom", + "tokenizer": "classic", + "filter": ["classic"] + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the analyzer: + +```json +POST /custom_classic_filter/_analyze +{ + "analyzer": "custom_classic", + "text": "John's co-operate was excellent." +} +``` +{% include copy-curl.html %} + +The response contains the generated tokens: + +```json +{ + "tokens": [ + { + "token": "John", + "start_offset": 0, + "end_offset": 6, + "type": "", + "position": 0 + }, + { + "token": "co", + "start_offset": 7, + "end_offset": 9, + "type": "", + "position": 1 + }, + { + "token": "operate", + "start_offset": 10, + "end_offset": 17, + "type": "", + "position": 2 + }, + { + "token": "was", + "start_offset": 18, + "end_offset": 21, + "type": "", + "position": 3 + }, + { + "token": "excellent", + "start_offset": 22, + "end_offset": 31, + "type": "", + "position": 4 + } + ] +} +``` + diff --git a/_analyzers/token-filters/common_gram.md b/_analyzers/token-filters/common_gram.md new file mode 100644 index 0000000000..58f5bbe149 --- /dev/null +++ b/_analyzers/token-filters/common_gram.md @@ -0,0 +1,94 @@ +--- +layout: default +title: Common grams +parent: Token filters +nav_order: 60 +--- + +# Common grams token filter + +The `common_grams` token filter improves search relevance by keeping commonly occurring phrases (common grams) in the text. This is useful when dealing with languages or datasets in which certain word combinations frequently occur as a unit and can impact search relevance if treated as separate tokens. If any common words are present in the input string, this token filter generates both their unigrams and bigrams. + +Using this token filter improves search relevance by keeping common phrases intact. This can help in matching queries more accurately, particularly for frequent word combinations. It also improves search precision by reducing the number of irrelevant matches. + +When using this filter, you must carefully select and maintain the `common_words` list. +{: .warning} + +## Parameters + +The `common_grams` token filter can be configured with the following parameters. + +Parameter | Required/Optional | Data type | Description +:--- | :--- | :--- | :--- +`common_words` | Required | List of strings | A list of words that should be treated as words that commonly appear together. These words will be used to generate common grams. If the `common_words` parameter is an empty list, the `common_grams` token filter becomes a no-op filter, meaning that it doesn't modify the input tokens at all. +`ignore_case` | Optional | Boolean | Indicates whether the filter should ignore case differences when matching common words. Default is `false`. +`query_mode` | Optional | Boolean | When set to `true`, the following rules are applied:
- Unigrams that are generated from `common_words` are not included in the output.
- Bigrams in which a non-common word is followed by a common word are retained in the output.
- Unigrams of non-common words are excluded if they are immediately followed by a common word.
- If a non-common word appears at the end of the text and is preceded by a common word, its unigram is not included in the output. + + +## Example + +The following example request creates a new index named `my_common_grams_index` and configures an analyzer with the `common_grams` filter: + +```json +PUT /my_common_grams_index +{ + "settings": { + "analysis": { + "filter": { + "my_common_grams_filter": { + "type": "common_grams", + "common_words": ["a", "in", "for"], + "ignore_case": true, + "query_mode": true + } + }, + "analyzer": { + "my_analyzer": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "my_common_grams_filter" + ] + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the analyzer: + +```json +GET /my_common_grams_index/_analyze +{ + "analyzer": "my_analyzer", + "text": "A quick black cat jumps over the lazy dog in the park" +} +``` +{% include copy-curl.html %} + +The response contains the generated tokens: + +```json +{ + "tokens": [ + {"token": "a_quick","start_offset": 0,"end_offset": 7,"type": "gram","position": 0}, + {"token": "quick","start_offset": 2,"end_offset": 7,"type": "","position": 1}, + {"token": "black","start_offset": 8,"end_offset": 13,"type": "","position": 2}, + {"token": "cat","start_offset": 14,"end_offset": 17,"type": "","position": 3}, + {"token": "jumps","start_offset": 18,"end_offset": 23,"type": "","position": 4}, + {"token": "over","start_offset": 24,"end_offset": 28,"type": "","position": 5}, + {"token": "the","start_offset": 29,"end_offset": 32,"type": "","position": 6}, + {"token": "lazy","start_offset": 33,"end_offset": 37,"type": "","position": 7}, + {"token": "dog_in","start_offset": 38,"end_offset": 44,"type": "gram","position": 8}, + {"token": "in_the","start_offset": 42,"end_offset": 48,"type": "gram","position": 9}, + {"token": "the","start_offset": 45,"end_offset": 48,"type": "","position": 10}, + {"token": "park","start_offset": 49,"end_offset": 53,"type": "","position": 11} + ] +} +``` + diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md index 80e8916bef..b982d69ecb 100644 --- a/_analyzers/token-filters/index.md +++ b/_analyzers/token-filters/index.md @@ -4,6 +4,8 @@ title: Token filters nav_order: 70 has_children: true has_toc: false +redirect_from: + - /analyzers/token-filters/index/ --- # Token filters @@ -14,11 +16,11 @@ The following table lists all token filters that OpenSearch supports. Token filter | Underlying Lucene token filter| Description [`apostrophe`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/apostrophe/) | [ApostropheFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/tr/ApostropheFilter.html) | In each token containing an apostrophe, the `apostrophe` token filter removes the apostrophe itself and all characters following it. -`asciifolding` | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters. +[`asciifolding`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/asciifolding/) | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters. `cjk_bigram` | [CJKBigramFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html) | Forms bigrams of Chinese, Japanese, and Korean (CJK) tokens. -`cjk_width` | [CJKWidthFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html) | Normalizes Chinese, Japanese, and Korean (CJK) tokens according to the following rules:
- Folds full-width ASCII character variants into the equivalent basic Latin characters.
- Folds half-width Katakana character variants into the equivalent Kana characters. -`classic` | [ClassicFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/classic/ClassicFilter.html) | Performs optional post-processing on the tokens generated by the classic tokenizer. Removes possessives (`'s`) and removes `.` from acronyms. -`common_grams` | [CommonGramsFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html) | Generates bigrams for a list of frequently occurring terms. The output contains both single terms and bigrams. +[`cjk_width`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/cjk-width/) | [CJKWidthFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html) | Normalizes Chinese, Japanese, and Korean (CJK) tokens according to the following rules:
- Folds full-width ASCII character variants into their equivalent basic Latin characters.
- Folds half-width katakana character variants into their equivalent kana characters. +[`classic`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/classic) | [ClassicFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/classic/ClassicFilter.html) | Performs optional post-processing on the tokens generated by the classic tokenizer. Removes possessives (`'s`) and removes `.` from acronyms. +[`common_grams`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/common_gram/) | [CommonGramsFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html) | Generates bigrams for a list of frequently occurring terms. The output contains both single terms and bigrams. `conditional` | [ConditionalTokenFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html) | Applies an ordered list of token filters to tokens that match the conditions provided in a script. [`decimal_digit`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/decimal-digit/) | [DecimalDigitFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/core/DecimalDigitFilter.html) | Converts all digits in the Unicode decimal number general category to basic Latin digits (0--9). `delimited_payload` | [DelimitedPayloadTokenFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/payloads/DelimitedPayloadTokenFilter.html) | Separates a token stream into tokens with corresponding payloads, based on a provided delimiter. A token consists of all characters before the delimiter, and a payload consists of all characters after the delimiter. For example, if the delimiter is `|`, then for the string `foo|bar`, `foo` is the token and `bar` is the payload. diff --git a/_analyzers/tokenizers/index.md b/_analyzers/tokenizers/index.md index d401851f60..e5ac796c12 100644 --- a/_analyzers/tokenizers/index.md +++ b/_analyzers/tokenizers/index.md @@ -4,6 +4,8 @@ title: Tokenizers nav_order: 60 has_children: false has_toc: false +redirect_from: + - /analyzers/tokenizers/index/ --- # Tokenizers diff --git a/_api-reference/analyze-apis.md b/_api-reference/analyze-apis.md index ac8e9e249f..5a63f665d9 100644 --- a/_api-reference/analyze-apis.md +++ b/_api-reference/analyze-apis.md @@ -22,7 +22,7 @@ If you use the Security plugin, you must have the `manage index` privilege. If y ## Path and HTTP methods -``` +```json GET /_analyze GET /{index}/_analyze POST /_analyze @@ -81,7 +81,7 @@ text | String or Array of Strings | Text to analyze. If you provide an array of [Set a token limit](#set-a-token-limit) -#### Analyze array of text strings +### Analyze array of text strings When you pass an array of strings to the `text` field, it is analyzed as a multi-value field. @@ -145,7 +145,7 @@ The previous request returns the following fields: } ```` -#### Apply a built-in analyzer +### Apply a built-in analyzer If you omit the `index` path parameter, you can apply any of the built-in analyzers to the text string. @@ -190,7 +190,7 @@ The previous request returns the following fields: } ```` -#### Apply a custom analyzer +### Apply a custom analyzer You can create your own analyzer and specify it in an analyze request. @@ -244,7 +244,7 @@ The previous request returns the following fields: } ```` -#### Apply a custom transient analyzer +### Apply a custom transient analyzer You can build a custom transient analyzer from tokenizers, token filters, or character filters. Use the `filter` parameter to specify token filters. @@ -373,7 +373,7 @@ The previous request returns the following fields: } ```` -#### Specify an index +### Specify an index You can analyze text using an index's default analyzer, or you can specify a different analyzer. @@ -446,7 +446,7 @@ The previous request returns the following fields: } ```` -#### Derive the analyzer from an index field +### Derive the analyzer from an index field You can pass text and a field in the index. The API looks up the field's analyzer and uses it to analyze the text. @@ -493,7 +493,7 @@ The previous request returns the following fields: } ```` -#### Specify a normalizer +### Specify a normalizer Instead of using a keyword field, you can use the normalizer associated with the index. A normalizer causes the analysis change to produce a single token. @@ -557,7 +557,7 @@ The previous request returns the following fields: } ```` -#### Get token details +### Get token details You can obtain additional details for all tokens by setting the `explain` attribute to `true`. @@ -640,7 +640,7 @@ The previous request returns the following fields: } ```` -#### Set a token limit +### Set a token limit You can set a limit to the number of tokens generated. Setting a lower value reduces a node's memory usage. The default value is 10000. @@ -659,7 +659,7 @@ PUT /books2 The preceding request is an index API rather than an analyze API. See [Dynamic index-level index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#dynamic-index-level-index-settings) for additional details. {: .note} -### Response fields +## Response body fields The text analysis endpoints return the following response fields. diff --git a/_api-reference/cat/cat-aliases.md b/_api-reference/cat/cat-aliases.md index 2d5c5c300a..950d497351 100644 --- a/_api-reference/cat/cat-aliases.md +++ b/_api-reference/cat/cat-aliases.md @@ -18,17 +18,13 @@ The CAT aliases operation lists the mapping of aliases to indexes, plus routing ## Path and HTTP methods -``` +```json GET _cat/aliases/ GET _cat/aliases ``` +{% include copy-curl.html %} - -## URL parameters - -All CAT aliases URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-allocation.md b/_api-reference/cat/cat-allocation.md index 085a755dc1..a57c861a4b 100644 --- a/_api-reference/cat/cat-allocation.md +++ b/_api-reference/cat/cat-allocation.md @@ -17,16 +17,12 @@ The CAT allocation operation lists the allocation of disk space for indexes and ## Path and HTTP methods -``` +```json GET _cat/allocation?v GET _cat/allocation/ ``` -## URL parameters - -All CAT allocation URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-cluster_manager.md b/_api-reference/cat/cat-cluster_manager.md index d81e334009..1b75074e12 100644 --- a/_api-reference/cat/cat-cluster_manager.md +++ b/_api-reference/cat/cat-cluster_manager.md @@ -17,15 +17,11 @@ The CAT cluster manager operation lists information that helps identify the elec ## Path and HTTP methods -``` +```json GET _cat/cluster_manager ``` -## URL parameters - -All CAT cluster manager URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-count.md b/_api-reference/cat/cat-count.md index 8d0b4fbad2..94a422d061 100644 --- a/_api-reference/cat/cat-count.md +++ b/_api-reference/cat/cat-count.md @@ -18,15 +18,11 @@ The CAT count operation lists the number of documents in your cluster. ## Path and HTTP methods -``` +```json GET _cat/count?v GET _cat/count/?v ``` -## URL parameters - -All CAT count URL parameters are optional. You can specify any of the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index). - ## Example requests ```json diff --git a/_api-reference/cat/cat-field-data.md b/_api-reference/cat/cat-field-data.md index 05c720b952..3012bbbfe9 100644 --- a/_api-reference/cat/cat-field-data.md +++ b/_api-reference/cat/cat-field-data.md @@ -17,16 +17,12 @@ The CAT Field Data operation lists the memory size used by each field per node. ## Path and HTTP methods -``` +```json GET _cat/fielddata?v GET _cat/fielddata/?v ``` -## URL parameters - -All CAT fielddata URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameter: +## Query parameters Parameter | Type | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-health.md b/_api-reference/cat/cat-health.md index 1c400916ad..0e4b784693 100644 --- a/_api-reference/cat/cat-health.md +++ b/_api-reference/cat/cat-health.md @@ -18,14 +18,11 @@ The CAT health operation lists the status of the cluster, how long the cluster h ## Path and HTTP methods -``` +```json GET _cat/health?v ``` -{% include copy-curl.html %} -## URL parameters - -All CAT health URL parameters are optional. +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -39,6 +36,7 @@ The following example request give cluster health information for the past 5 day ```json GET _cat/health?v&time=5d ``` +{% include copy-curl.html %} ## Example response diff --git a/_api-reference/cat/cat-indices.md b/_api-reference/cat/cat-indices.md index 16c57e5791..4bbdde573c 100644 --- a/_api-reference/cat/cat-indices.md +++ b/_api-reference/cat/cat-indices.md @@ -1,6 +1,6 @@ --- layout: default -title: CAT indices operation +title: CAT indices parent: CAT API nav_order: 25 has_children: false @@ -17,16 +17,12 @@ The CAT indices operation lists information related to indexes, that is, how muc ## Path and HTTP methods -``` +```json GET _cat/indices/ GET _cat/indices ``` -## URL parameters - -All URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index/), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -40,14 +36,14 @@ expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Comb ## Example requests -``` +```json GET _cat/indices?v ``` {% include copy-curl.html %} To limit the information to a specific index, add the index name after your query. -``` +```json GET _cat/indices/?v ``` {% include copy-curl.html %} @@ -66,3 +62,7 @@ GET _cat/indices/index1,index2,index3 health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size green | open | movies | UZbpfERBQ1-3GSH2bnM3sg | 1 | 1 | 1 | 0 | 7.7kb | 3.8kb ``` + +## Limiting the response size + +To limit the number of indexes returned, configure the `cat.indices.response.limit.number_of_indices` setting. For more information, see [Cluster-level CAT response limit settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/cluster-settings/#cluster-level-cat-response-limit-settings). \ No newline at end of file diff --git a/_api-reference/cat/cat-nodeattrs.md b/_api-reference/cat/cat-nodeattrs.md index b09e164698..62471f3960 100644 --- a/_api-reference/cat/cat-nodeattrs.md +++ b/_api-reference/cat/cat-nodeattrs.md @@ -17,15 +17,11 @@ The CAT nodeattrs operation lists the attributes of custom nodes. ## Path and HTTP methods -``` +```json GET _cat/nodeattrs ``` -## URL parameters - -All CAT nodeattrs URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -36,7 +32,7 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to The following example request returns attributes about custom nodes: -``` +```json GET _cat/nodeattrs?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-nodes.md b/_api-reference/cat/cat-nodes.md index 5e7238a0d0..d20393a251 100644 --- a/_api-reference/cat/cat-nodes.md +++ b/_api-reference/cat/cat-nodes.md @@ -19,11 +19,11 @@ A few important node metrics are `pid`, `name`, `cluster_manager`, `ip`, `port`, ## Path and HTTP methods -``` +```json GET _cat/nodes ``` -## URL parameters +## Query parameters All CAT nodes URL parameters are optional. @@ -41,7 +41,7 @@ include_unloaded_segments | Boolean | Whether to include information from segmen The following example request lists node level information: -``` +```json GET _cat/nodes?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-pending-tasks.md b/_api-reference/cat/cat-pending-tasks.md index ea224670ac..b047dd5d62 100644 --- a/_api-reference/cat/cat-pending-tasks.md +++ b/_api-reference/cat/cat-pending-tasks.md @@ -18,15 +18,11 @@ The CAT pending tasks operation lists the progress of all pending tasks, includi ## Path and HTTP methods -``` +```json GET _cat/pending_tasks ``` -## URL parameters - -All CAT nodes URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -38,7 +34,7 @@ time | Time | Specify the units for time. For example, `5d` or `7h`. For more in The following example request lists the progress of all pending node tasks: -``` +```json GET _cat/pending_tasks?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-plugins.md b/_api-reference/cat/cat-plugins.md index 358eb70fbf..45866e8ebd 100644 --- a/_api-reference/cat/cat-plugins.md +++ b/_api-reference/cat/cat-plugins.md @@ -18,15 +18,15 @@ The CAT plugins operation lists the names, components, and versions of the insta ## Path and HTTP methods -``` +```json GET _cat/plugins ``` -## URL parameters +## Query parameters -All CAT plugins URL parameters are optional. +All parameters are optional. -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +In addition to the [common parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: Parameter | Type | Description :--- | :--- | :--- @@ -37,7 +37,7 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to The following example request lists all installed plugins: -``` +```json GET _cat/plugins?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-recovery.md b/_api-reference/cat/cat-recovery.md index 8f251a94e0..fc29e14ac6 100644 --- a/_api-reference/cat/cat-recovery.md +++ b/_api-reference/cat/cat-recovery.md @@ -18,15 +18,11 @@ The CAT recovery operation lists all completed and ongoing index and shard recov ## Path and HTTP methods -``` +```json GET _cat/recovery ``` -## URL parameters - -All CAT recovery URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -37,14 +33,14 @@ time | Time | Specify the units for time. For example, `5d` or `7h`. For more in ## Example requests -``` +```json GET _cat/recovery?v ``` {% include copy-curl.html %} To see only the recoveries of a specific index, add the index name after your query. -``` +```json GET _cat/recovery/?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-repositories.md b/_api-reference/cat/cat-repositories.md index f0fc4bb622..c197ee5c6c 100644 --- a/_api-reference/cat/cat-repositories.md +++ b/_api-reference/cat/cat-repositories.md @@ -17,15 +17,12 @@ The CAT repositories operation lists all snapshot repositories for a cluster. ## Path and HTTP methods -``` +```json GET _cat/repositories ``` -## URL parameters - -All CAT repositories URL parameters are optional. +## Query parameters -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: Parameter | Type | Description :--- | :--- | :--- @@ -36,7 +33,7 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to The following example request lists all snapshot repositories in the cluster: -``` +```json GET _cat/repositories?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-segment-replication.md b/_api-reference/cat/cat-segment-replication.md index 5900b97a7c..e943d0a451 100644 --- a/_api-reference/cat/cat-segment-replication.md +++ b/_api-reference/cat/cat-segment-replication.md @@ -24,16 +24,12 @@ GET /_cat/segment_replication/ ## Path parameters -The following table lists the available optional path parameter. - Parameter | Type | Description :--- | :--- | :--- `index` | String | The name of the index, or a comma-separated list or wildcard expression of index names used to filter results. If this parameter is not provided, the response contains information about all indexes in the cluster. ## Query parameters -The CAT segment replication API operation supports the following optional query parameters. - Parameter | Data type | Description :--- |:-----------| :--- `active_only` | Boolean | If `true`, the response only includes active segment replications. Defaults to `false`. diff --git a/_api-reference/cat/cat-segments.md b/_api-reference/cat/cat-segments.md index cd9eda38be..76696d3886 100644 --- a/_api-reference/cat/cat-segments.md +++ b/_api-reference/cat/cat-segments.md @@ -18,15 +18,11 @@ The cat segments operation lists Lucene segment-level information for each index ## Path and HTTP methods -``` +```json GET _cat/segments ``` -## URL parameters - -All CAT segments URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -35,21 +31,21 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to ## Example requests -``` +```json GET _cat/segments?v ``` {% include copy-curl.html %} To see only the information about segments of a specific index, add the index name after your query. -``` +```json GET _cat/segments/?v ``` {% include copy-curl.html %} If you want to get information for more than one index, separate the indexes with commas: -``` +```json GET _cat/segments/index1,index2,index3 ``` {% include copy-curl.html %} @@ -61,3 +57,7 @@ index | shard | prirep | ip | segment | generation | docs.count | docs.deleted | movies | 0 | p | 172.18.0.4 | _0 | 0 | 1 | 0 | 3.5kb | 1364 | true | true | 8.7.0 | true movies | 0 | r | 172.18.0.3 | _0 | 0 | 1 | 0 | 3.5kb | 1364 | true | true | 8.7.0 | true ``` + +## Limiting the response size + +To limit the number of indexes returned, configure the `cat.segments.response.limit.number_of_indices` setting. For more information, see [Cluster-level CAT response limit settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/cluster-settings/#cluster-level-cat-response-limit-settings). \ No newline at end of file diff --git a/_api-reference/cat/cat-shards.md b/_api-reference/cat/cat-shards.md index b07f11aca3..c9677cb0ed 100644 --- a/_api-reference/cat/cat-shards.md +++ b/_api-reference/cat/cat-shards.md @@ -18,21 +18,22 @@ The CAT shards operation lists the state of all primary and replica shards and h ## Path and HTTP methods -``` +```json GET _cat/shards ``` -## URL parameters +## Query parameters -All cat shards URL parameters are optional. +All parameters are optional. -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +In addition to the [common parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: Parameter | Type | Description :--- | :--- | :--- bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. +cancel_after_time_interval | Time | The amount of time after which the shard request will be canceled. Default is `-1`. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). ## Example requests @@ -65,3 +66,7 @@ index | shard | prirep | state | docs | store | ip | | node plugins | 0 | p | STARTED | 0 | 208b | 172.18.0.4 | odfe-node1 plugins | 0 | r | STARTED | 0 | 208b | 172.18.0.3 | odfe-node2 ``` + +## Limiting the response size + +To limit the number of shards returned, configure the `cat.shards.response.limit.number_of_shards` setting. For more information, see [Cluster-level CAT response limit settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/cluster-settings/#cluster-level-cat-response-limit-settings). \ No newline at end of file diff --git a/_api-reference/cat/cat-snapshots.md b/_api-reference/cat/cat-snapshots.md index 2e1bd514bf..71c3b3f75d 100644 --- a/_api-reference/cat/cat-snapshots.md +++ b/_api-reference/cat/cat-snapshots.md @@ -18,15 +18,11 @@ The CAT snapshots operation lists all snapshots for a repository. ## Path and HTTP methods -``` +```json GET _cat/snapshots ``` -## URL parameters - -All CAT snapshots URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameter: +## Query parameters Parameter | Type | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-tasks.md b/_api-reference/cat/cat-tasks.md index 7a71b592e7..5419d5c647 100644 --- a/_api-reference/cat/cat-tasks.md +++ b/_api-reference/cat/cat-tasks.md @@ -17,15 +17,11 @@ The CAT tasks operation lists the progress of all tasks currently running on you ## Path and HTTP methods -``` +```json GET _cat/tasks ``` -## URL parameters - -All CAT tasks URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-templates.md b/_api-reference/cat/cat-templates.md index ba47ae711d..90b7d43fc7 100644 --- a/_api-reference/cat/cat-templates.md +++ b/_api-reference/cat/cat-templates.md @@ -18,16 +18,11 @@ The CAT templates operation lists the names, patterns, order numbers, and versio ## Path and HTTP methods -``` +```json GET _cat/templates ``` -{% include copy-curl.html %} - -## URL parameters -All CAT templates URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -38,14 +33,14 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to The following example request returns information about all templates: -``` +```json GET _cat/templates?v ``` {% include copy-curl.html %} If you want to get information for a specific template or pattern: -``` +```json GET _cat/templates/ ``` {% include copy-curl.html %} diff --git a/_api-reference/cat/cat-thread-pool.md b/_api-reference/cat/cat-thread-pool.md index de24052175..3171ae830e 100644 --- a/_api-reference/cat/cat-thread-pool.md +++ b/_api-reference/cat/cat-thread-pool.md @@ -17,15 +17,11 @@ The CAT thread pool operation lists the active, queued, and rejected threads of ## Path and HTTP methods -``` +```json GET _cat/thread_pool ``` -## URL parameters - -All CAT thread pool URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: +## Query parameters Parameter | Type | Description :--- | :--- | :--- @@ -36,21 +32,21 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to The following example request gives information about thread pools on all nodes: -``` +```json GET _cat/thread_pool?v ``` {% include copy-curl.html %} If you want to get information for more than one thread pool, separate the thread pool names with commas: -``` +```json GET _cat/thread_pool/thread_pool_name_1,thread_pool_name_2,thread_pool_name_3 ``` {% include copy-curl.html %} If you want to limit the information to a specific thread pool, add the thread pool name after your query: -``` +```json GET _cat/thread_pool/?v ``` {% include copy-curl.html %} diff --git a/_api-reference/cluster-api/cluster-allocation.md b/_api-reference/cluster-api/cluster-allocation.md index 4ec6e27f2b..2f8bd9799c 100644 --- a/_api-reference/cluster-api/cluster-allocation.md +++ b/_api-reference/cluster-api/cluster-allocation.md @@ -17,18 +17,16 @@ The most basic cluster allocation explain request finds an unassigned shard and If you add some options, you can instead get information on a specific shard, including why OpenSearch assigned it to its current node. - ## Path and HTTP methods -``` +```json GET _cluster/allocation/explain POST _cluster/allocation/explain ``` +## Query parameters -## URL parameters - -All cluster allocation explain parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- @@ -36,7 +34,7 @@ include_yes_decisions | Boolean | OpenSearch makes a series of yes or no decisio include_disk_info | Boolean | Whether to include information about disk usage in the response. Default is `false`. -## Request body +## Request body fields All cluster allocation explain fields are optional. diff --git a/_api-reference/cluster-api/cluster-awareness.md b/_api-reference/cluster-api/cluster-awareness.md index 18259b9a98..8c162f214c 100644 --- a/_api-reference/cluster-api/cluster-awareness.md +++ b/_api-reference/cluster-api/cluster-awareness.md @@ -17,7 +17,7 @@ To control the distribution of search or HTTP traffic, you can use the weights p ## Path and HTTP methods -``` +```json PUT /_cluster/routing/awareness//weights GET /_cluster/routing/awareness//weights?local GET /_cluster/routing/awareness//weights @@ -29,7 +29,7 @@ Parameter | Type | Description :--- | :--- | :--- attribute | String | The name of the awareness attribute, usually `zone`. The attribute name must match the values listed in the request body when assigning weights to zones. -## Request body parameters +## Request body fields Parameter | Type | Description :--- | :--- | :--- @@ -51,11 +51,12 @@ In the following example request body, `zone_1` and `zone_2` receive 50 requests } ``` -## Example: Weighted round robin search +## Example requests + +### Weighted round robin search The following example request creates a round robin shard allocation for search traffic by using an undefined ratio: -#### Request ```json PUT /_cluster/routing/awareness/zone/weights @@ -71,27 +72,37 @@ PUT /_cluster/routing/awareness/zone/weights ``` {% include copy-curl.html %} -#### Response -``` -{ - "acknowledged": true -} -``` +### Getting weights for all zones + +The following example request gets weights for all zones. +```json +GET /_cluster/routing/awareness/zone/weights +``` +{% include copy-curl.html %} -## Example: Getting weights for all zones -The following example request gets weights for all zones. +### Deleting weights -#### Request +You can remove your weight ratio for each zone using the `DELETE` method: ```json -GET /_cluster/routing/awareness/zone/weights +DELETE /_cluster/routing/awareness/zone/weights ``` {% include copy-curl.html %} -#### Response +## Example responses + +OpenSearch typically responds with the following when successfully allocating shards: + +```json +{ + "acknowledged": true +} +``` + +### Getting weights for all zone OpenSearch responds with the weight of each zone: @@ -106,26 +117,7 @@ OpenSearch responds with the weight of each zone: }, "_version":1 } -``` - -## Example: Deleting weights - -You can remove your weight ratio for each zone using the `DELETE` method. -#### Request - -```json -DELETE /_cluster/routing/awareness/zone/weights -``` -{% include copy-curl.html %} - -#### Response - -```json -{ - "_version":1 -} -``` ## Next steps diff --git a/_api-reference/cluster-api/cluster-decommission.md b/_api-reference/cluster-api/cluster-decommission.md index c707e5390a..1cf17c5b6b 100644 --- a/_api-reference/cluster-api/cluster-decommission.md +++ b/_api-reference/cluster-api/cluster-decommission.md @@ -18,27 +18,27 @@ The cluster decommission operation adds support decommissioning based on awarene For more information about allocation awareness, see [Shard allocation awareness]({{site.url}}{{site.baseurl}}//opensearch/cluster/#shard-allocation-awareness). -## HTTP and Path methods +## Path and HTTP methods -``` +```json PUT /_cluster/decommission/awareness/{awareness_attribute_name}/{awareness_attribute_value} GET /_cluster/decommission/awareness/{awareness_attribute_name}/_status DELETE /_cluster/decommission/awareness ``` -## URL parameters +## Path parameters Parameter | Type | Description :--- | :--- | :--- awareness_attribute_name | String | The name of awareness attribute, usually `zone`. awareness_attribute_value | String | The value of the awareness attribute. For example, if you have shards allocated in two different zones, you can give each zone a value of `zone-a` or `zoneb`. The cluster decommission operation decommissions the zone listed in the method. +## Example requests -## Example: Decommissioning and recommissioning a zone +### Decommissioning and recommissioning a zone You can use the following example requests to decommission and recommission a zone: -#### Request The following example request decommissions `zone-a`: @@ -54,27 +54,29 @@ DELETE /_cluster/decommission/awareness ``` {% include copy-curl.html %} -#### Example response +### Getting zone decommission status +The following example requests returns the decommission status of all zones. ```json -{ - "acknowledged": true -} +GET /_cluster/decommission/awareness/zone/_status ``` +{% include copy-curl.html %} -## Example: Getting zone decommission status +#### Example responses -The following example requests returns the decommission status of all zones. - -#### Request +The following example response shows a successful zone decommission: ```json -GET /_cluster/decommission/awareness/zone/_status +{ + "acknowledged": true +} ``` -{% include copy-curl.html %} -#### Example response +### Getting zone decommission status + +The following example response returns the decommission status of all zones: + ```json { diff --git a/_api-reference/cluster-api/cluster-health.md b/_api-reference/cluster-api/cluster-health.md index 7081a1a587..df8f3f24e3 100644 --- a/_api-reference/cluster-api/cluster-health.md +++ b/_api-reference/cluster-api/cluster-health.md @@ -20,11 +20,19 @@ To get the status of a specific index, provide the index name. ## Path and HTTP methods -``` +```json GET _cluster/health GET _cluster/health/ ``` +## Path parameters + +The following table lists the available path parameters. All path parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +<index-name> | String | Limits health reporting to a specific index. Can be a single index or a comma-separated list of index names. + ## Query parameters The following table lists the available query parameters. All query parameters are optional. @@ -90,7 +98,7 @@ The response contains cluster health information: } ``` -## Response fields +## Response body fields The following table lists all response fields. diff --git a/_api-reference/cluster-api/cluster-settings.md b/_api-reference/cluster-api/cluster-settings.md index ec682ecdbd..1e0977c56a 100644 --- a/_api-reference/cluster-api/cluster-settings.md +++ b/_api-reference/cluster-api/cluster-settings.md @@ -16,14 +16,14 @@ The cluster settings operation lets you check the current settings for your clus ## Path and HTTP methods -``` +```json GET _cluster/settings PUT _cluster/settings ``` ## Path parameters -All cluster setting parameters are optional. +All parameters are optional. Parameter | Data type | Description :--- | :--- | :--- @@ -32,9 +32,9 @@ include_defaults (GET only) | Boolean | Whether to include default settings as p cluster_manager_timeout | Time unit | The amount of time to wait for a response from the cluster manager node. Default is `30 seconds`. timeout (PUT only) | Time unit | The amount of time to wait for a response from the cluster. Default is `30 seconds`. -## Request fields +## Request body fields -The GET operation has no request body options. All cluster setting field parameters are optional. +The GET operation has no request body fields. All cluster setting field parameters are optional. Not all cluster settings can be updated using the cluster settings API. You will receive the error message `"setting [cluster.some.setting], not dynamically updateable"` when trying to configure these settings through the API. {: .note } diff --git a/_api-reference/cluster-api/cluster-stats.md b/_api-reference/cluster-api/cluster-stats.md index fb0ade2c6b..0a1b348a67 100644 --- a/_api-reference/cluster-api/cluster-stats.md +++ b/_api-reference/cluster-api/cluster-stats.md @@ -21,19 +21,68 @@ The cluster stats API operation returns statistics about your cluster. ```json GET _cluster/stats GET _cluster/stats/nodes/ +GET _cluster/stats//nodes/ +GET _cluster/stats///nodes/ ``` -## URL parameters +## Path parameters -All cluster stats parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- <node-filters> | List | A comma-separated list of [node filters]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/index/#node-filters) that OpenSearch uses to filter results. +metric | String | A comma-separated list of [metric groups](#metric-groups), for example, `jvm,fs`. Default is all metric groups. +index_metric | String | A comma-separated list of [index metric groups](#index-metric-groups), for example, `docs,store`. Default is all index metrics. - Although the `master` node is now called `cluster_manager` for version 2.0, we retained the `master` field for backwards compatibility. If you have a node that has either a `master` role or a `cluster_manager` role, the `count` increases for both fields by 1. To see an example node count increase, see the Response sample. - {: .note } +Although the term `master` was deprecated in favor of `cluster_manager` subsequent to OpenSearch 2.0, the `master` field was retained for backward compatibility. If you have a node that has either a `master` role or a `cluster_manager` role, the `count` increases for both fields by 1. For an example node count increase, see the [example response](#example-response). +{: .note } + +### Metric groups + +The following table lists all available metric groups. + +Metric | Description +:--- |:---- +`indices` | Statistics about indexes in the cluster. +`os` | Statistics about the host OS, including load and memory. +`process` | Statistics about processes, including open file descriptors and CPU usage. +`jvm` | Statistics about the JVM, including heap usage and threads. +`fs` | Statistics about file system usage. +`plugins` | Statistics about OpenSearch plugins integrated with the nodes. +`network_types` | A list of the transport and HTTP networks connected to the nodes. +`discovery_type` | The method used by the nodes to find other nodes in the cluster. +`packaging_types` | Information about each node's OpenSearch distribution. +`ingest` | Statistics about ingest pipelines. + +### Index metric groups + +To filter the information returned for the `indices` metric, you can use specific `index_metric` values. These values are only supported when using the following query types: + +```json +GET _cluster/stats/_all//_nodes/ +GET _cluster/stats/indices//_nodes/ +``` + +The following index metrics are supported: + +- `shards` +- `docs` +- `store` +- `fielddata` +- `query_cache` +- `completion` +- `segments` +- `mappings` +- `analysis` + +For example, the following query requests statistics for `docs` and `search`: + +```json +GET _cluster/stats/indices/docs,segments/_nodes/_all +``` +{% include copy-curl.html %} ## Example request @@ -491,32 +540,32 @@ GET _cluster/stats/nodes/_cluster_manager Field | Description :--- | :--- -nodes | How many nodes returned in the response. -cluster_name | The cluster's name. -cluster_uuid | The cluster's uuid. -timestamp | The Unix epoch time of when the cluster was last refreshed. -status | The cluster's health status. -indices | Statistics about the indexes in the cluster. -indices.count | How many indexes are in the cluster. -indices.shards | Information about the cluster's shards. -indices.docs | How many documents are still in the cluster and how many documents are deleted. -indices.store | Information about the cluster's storage. -indices.fielddata | Information about the cluster's field data -indices.query_cache | Data about the cluster's query cache. -indices.completion | How many bytes in memory are used to complete operations. -indices.segments | Information about the cluster's segments, which are small Lucene indexes. -indices.mappings | Mappings within the cluster. -indices.analysis | Information about analyzers used in the cluster. -nodes | Statistics about the nodes in the cluster. -nodes.count | How many nodes were returned from the request. -nodes.versions | OpenSearch's version number. -nodes.os | Information about the operating systems used in the nodes. -nodes.process | The processes the returned nodes use. -nodes.jvm | Statistics about the Java Virtual Machines in use. -nodes.fs | The nodes' file storage. -nodes.plugins | The OpenSearch plugins integrated within the nodes. -nodes.network_types | The transport and HTTP networks within the nodes. -nodes.discovery_type | The method the nodes use to find other nodes within the cluster. -nodes.packaging_types | Information about the nodes' OpenSearch distribution. -nodes.ingest | Information about the nodes' ingest pipelines/nodes, if there are any. -total_time_spent | The total amount of download and upload time spent across all shards in the cluster when downloading or uploading from the remote store. +`nodes` | The number of nodes returned in the response. +`cluster_name` | The cluster's name. +`cluster_uuid` | The cluster's UUID. +`timestamp` | The Unix epoch time indicating when the cluster was last refreshed. +`status` | The cluster's health status. +`indices` | Statistics about the indexes in the cluster. +`indices.count` | The number of indexes in the cluster. +`indices.shards` | Information about the cluster's shards. +`indices.docs` | The number of documents remaining in the cluster and the number of documents that were deleted. +`indices.store` | Information about the cluster's storage. +`indices.fielddata` | Information about the cluster's field data. +`indices.query_cache` | Data about the cluster's query cache. +`indices.completion` | The number of bytes in memory that were used to complete operations. +`indices.segments` | Information about the cluster's segments, which are small Lucene indexes. +`indices.mappings` | Information about mappings in the cluster. +`indices.analysis` | Information about analyzers used in the cluster. +`nodes` | Statistics about the nodes in the cluster. +`nodes.count` | The number of nodes returned by the request. +`nodes.versions` | The OpenSearch version number for each node. +`nodes.os` | Information about the operating systems used by the nodes. +`nodes.process` | A list of processes used by each node. +`nodes.jvm` | Statistics about the JVMs in use. +`nodes.fs` | Information about the node's file storage. +`nodes.plugins` | A list of the OpenSearch plugins integrated with the nodes. +`nodes.network_types` | A list of the transport and HTTP networks connected to the nodes. +`nodes.discovery_type` | A list of methods used by the nodes to find other nodes in the cluster. +`nodes.packaging_types` | Information about each node's OpenSearch distribution. +`nodes.ingest` | Information about the node's ingest pipelines/nodes, if there are any. +`total_time_spent` | The total amount of download and upload time spent across all shards in the cluster when downloading or uploading from the remote store. diff --git a/_api-reference/count.md b/_api-reference/count.md index 2ac336eeb0..048dad9609 100644 --- a/_api-reference/count.md +++ b/_api-reference/count.md @@ -13,7 +13,35 @@ redirect_from: The count API gives you quick access to the number of documents that match a query. You can also use it to check the document count of an index, data stream, or cluster. -## Example + +## Path and HTTP methods + +```json +GET /_count/ +POST /_count/ +``` + + +## Query parameters + +All parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +`allow_no_indices` | Boolean | If false, the request returns an error if any wildcard expression or index alias targets any closed or missing indexes. Default is `false`. +`analyzer` | String | The analyzer to use in the query string. +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. +`default_operator` | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. +`df` | String | The default field in case a field prefix is not provided in the query string. +`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. +`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response. Default is `false`. +`lenient` | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is `false`. +`min_score` | Float | Include only documents with a minimum `_score` value in the result. +`routing` | String | Value used to route the operation to a specific shard. +`preference` | String | Specifies which shard or node OpenSearch should perform the count operation on. +`terminate_after` | Integer | The maximum number of documents OpenSearch should process before terminating the request. + +## Example requests To see the number of documents that match a query: @@ -64,35 +92,7 @@ GET _count Alternatively, you could use the [cat indexes]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream. {: .note } - -## Path and HTTP methods - -``` -GET /_count/ -POST /_count/ -``` - - -## URL parameters - -All count parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- -`allow_no_indices` | Boolean | If false, the request returns an error if any wildcard expression or index alias targets any closed or missing indexes. Default is `false`. -`analyzer` | String | The analyzer to use in the query string. -`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. -`default_operator` | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. -`df` | String | The default field in case a field prefix is not provided in the query string. -`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. -`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response. Default is `false`. -`lenient` | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is `false`. -`min_score` | Float | Include only documents with a minimum `_score` value in the result. -`routing` | String | Value used to route the operation to a specific shard. -`preference` | String | Specifies which shard or node OpenSearch should perform the count operation on. -`terminate_after` | Integer | The maximum number of documents OpenSearch should process before terminating the request. - -## Response +## Example response ```json { diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md new file mode 100644 index 0000000000..c127eab527 --- /dev/null +++ b/_api-reference/document-apis/bulk-streaming.md @@ -0,0 +1,81 @@ +--- +layout: default +title: Streaming bulk +parent: Document APIs +nav_order: 25 +redirect_from: + - /opensearch/rest-api/document-apis/bulk/streaming/ +--- + +# Streaming bulk +**Introduced 2.17.0** +{: .label .label-purple } + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/9065). +{: .warning} + +The streaming bulk operation lets you add, update, or delete multiple documents by streaming the request and getting the results as a streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to estimate the batch size (which is affected by the cluster operational state at any given time) and naturally applies backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP/1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. + +The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport layer. Both the `transport-reactor-netty4` plugin and the Streaming Bulk API are experimental. +{: .note} + +## Path and HTTP methods + +```json +POST _bulk/stream +POST /_bulk/stream +``` + +If you specify the index in the path, then you don't need to include it in the [request body chunks]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body). + +OpenSearch also accepts PUT requests to the `_bulk/stream` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource on a given path---doesn't make sense for streaming bulk requests. +{: .note } + + +## Query parameters + +The following table lists the available query parameters. All query parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`pipeline` | String | The pipeline ID for preprocessing documents. +`refresh` | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` causes the changes show up in search results immediately but degrades cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance isn't degraded. +`require_alias` | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. +`routing` | String | Routes the request to the specified shard. +`timeout` | Time | How long to wait for the request to return. Default is `1m`. +`type` | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using the `_doc` type for all indexes. +`wait_for_active_shards` | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have 2 replicas distributed across 2 additional nodes in order for the request to succeed. +`batch_interval` | Time | Specifies for how long bulk operations should be accumulated into a batch before sending the batch to data nodes. +`batch_size` | Time | Specifies how many bulk operations should be accumulated into a batch before sending the batch to data nodes. Default is `1`. +{% comment %}_source | List | asdf +`_source_excludes` | List | asdf +`_source_includes` | List | asdf{% endcomment %} + +## Request body fields + +The Streaming Bulk API request body is fully compatible with the [Bulk API request body]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body), where each bulk operation (create/index/update/delete) is sent as a separate chunk. + +## Example request + +```json +curl -X POST "http://localhost:9200/_bulk/stream" -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" -d' +{ "delete": { "_index": "movies", "_id": "tt2229499" } } +{ "index": { "_index": "movies", "_id": "tt1979320" } } +{ "title": "Rush", "year": 2013 } +{ "create": { "_index": "movies", "_id": "tt1392214" } } +{ "title": "Prisoners", "year": 2013 } +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "doc" : { "title": "World War Z" } } +' +``` +{% include copy.html %} + +## Example response + +Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations. For example, for the preceding request with no batching (default), the streaming response may appear as follows: + +```json +{"took": 11, "errors": false, "items": [ { "index": {"_index": "movies", "_id": "tt1979320", "_version": 1, "result": "created", "_shards": { "total": 2 "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1, "status": 201 } } ] } +{"took": 2, "errors": true, "items": [ { "create": { "_index": "movies", "_id": "tt1392214", "status": 409, "error": { "type": "version_conflict_engine_exception", "reason": "[tt1392214]: version conflict, document already exists (current version [1])", "index": "movies", "shard": "0", "index_uuid": "yhizhusbSWmP0G7OJnmcLg" } } } ] } +{"took": 4, "errors": true, "items": [ { "update": { "_index": "movies", "_id": "tt0816711", "status": 404, "error": { "type": "document_missing_exception", "reason": "[_doc][tt0816711]: document missing", "index": "movies", "shard": "0", "index_uuid": "yhizhusbSWmP0G7OJnmcLg" } } } ] } +``` diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 0475aa573d..2a2a5ef8a2 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -17,25 +17,11 @@ The bulk operation lets you add, update, or delete multiple documents in a singl Beginning in OpenSearch 2.9, when indexing documents using the bulk operation, the document `_id` must be 512 bytes or less in size. {: .note} -## Example - -```json -POST _bulk -{ "delete": { "_index": "movies", "_id": "tt2229499" } } -{ "index": { "_index": "movies", "_id": "tt1979320" } } -{ "title": "Rush", "year": 2013 } -{ "create": { "_index": "movies", "_id": "tt1392214" } } -{ "title": "Prisoners", "year": 2013 } -{ "update": { "_index": "movies", "_id": "tt0816711" } } -{ "doc" : { "title": "World War Z" } } - -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json POST _bulk POST /_bulk ``` @@ -46,23 +32,23 @@ OpenSearch also accepts PUT requests to the `_bulk` path, but we highly recommen {: .note } -## URL parameters +## Query parameters -All bulk URL parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- pipeline | String | The pipeline ID for preprocessing documents. -refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. +refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` causes the changes show up in search results immediately but degrades cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance isn't degraded. require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. routing | String | Routes the request to the specified shard. -timeout | Time | How long to wait for the request to return. Default `1m`. -type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. -wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. +timeout | Time | How long to wait for the request to return. Default is `1m`. +type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using the `_doc` type for all indexes. +wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have 2 replicas distributed across 2 additional nodes in order for the request to succeed. batch_size | Integer | **(Deprecated)** Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `2147483647` (documents are ingested by an ingest pipeline all at once). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches. {% comment %}_source | List | asdf -_source_excludes | list | asdf -_source_includes | list | asdf{% endcomment %} +_source_excludes | List | asdf +_source_includes | List | asdf{% endcomment %} ## Request body @@ -81,7 +67,7 @@ The optional JSON document doesn't need to be minified---spaces are fine---but i All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If you don't provide an ID, OpenSearch generates one automatically, which can make it challenging to update the document at a later time. -- Create +### Create Creates a document if it doesn't already exist and returns an error otherwise. The next line must include a JSON document: @@ -90,49 +76,64 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If { "title": "Prisoners", "year": 2013 } ``` -- Delete +### Delete - This action deletes a document if it exists. If the document doesn't exist, OpenSearch doesn't return an error but instead returns `not_found` under `result`. Delete actions don't require documents on the next line: +This action deletes a document if it exists. If the document doesn't exist, OpenSearch doesn't return an error but instead returns `not_found` under `result`. Delete actions don't require documents on the next line: - ```json - { "delete": { "_index": "movies", "_id": "tt2229499" } } - ``` +```json +{ "delete": { "_index": "movies", "_id": "tt2229499" } } +``` -- Index +### Index - Index actions create a document if it doesn't yet exist and replace the document if it already exists. The next line must include a JSON document: +Index actions create a document if it doesn't yet exist and replace the document if it already exists. The next line must include a JSON document: - ```json - { "index": { "_index": "movies", "_id": "tt1979320" } } - { "title": "Rush", "year": 2013} - ``` +```json +{ "index": { "_index": "movies", "_id": "tt1979320" } } +{ "title": "Rush", "year": 2013} +``` -- Update +### Update - By default, this action updates existing documents and returns an error if the document doesn't exist. The next line must include a full or partial JSON document, depending on how much of the document you want to update: +By default, this action updates existing documents and returns an error if the document doesn't exist. The next line must include a full or partial JSON document, depending on how much of the document you want to update: - ```json - { "update": { "_index": "movies", "_id": "tt0816711" } } - { "doc" : { "title": "World War Z" } } - ``` +```json +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "doc" : { "title": "World War Z" } } +``` - To upsert a document, specify `doc_as_upsert` as `true`. If a document exists, it is updated; if it does not exist, a new document is indexed with the parameters specified in the `doc` field: +### Upsert - - Upsert - ```json - { "update": { "_index": "movies", "_id": "tt0816711" } } - { "doc" : { "title": "World War Z" }, "doc_as_upsert": true } - ``` +To upsert a document, specify `doc_as_upsert` as `true`. If a document exists, it is updated; if it does not exist, a new document is indexed with the parameters specified in the `doc` field: + +```json +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "doc" : { "title": "World War Z" }, "doc_as_upsert": true } +``` - You can specify a script for more complex document updates by defining the script with the `source` or `id` from a document: +### Script +You can specify a script for more complex document updates by defining the script with the `source` or `id` from a document: +```json +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "script" : { "source": "ctx._source.title = \"World War Z\"" } } +``` - - Script - ```json - { "update": { "_index": "movies", "_id": "tt0816711" } } - { "script" : { "source": "ctx._source.title = \"World War Z\"" } } - ``` +## Example request + +```json +POST _bulk +{ "delete": { "_index": "movies", "_id": "tt2229499" } } +{ "index": { "_index": "movies", "_id": "tt1979320" } } +{ "title": "Rush", "year": 2013 } +{ "create": { "_index": "movies", "_id": "tt1392214" } } +{ "title": "Prisoners", "year": 2013 } +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "doc" : { "title": "World War Z" } } + +``` +{% include copy-curl.html %} ## Example response diff --git a/_api-reference/document-apis/delete-by-query.md b/_api-reference/document-apis/delete-by-query.md index 64da909aad..a55617e145 100644 --- a/_api-reference/document-apis/delete-by-query.md +++ b/_api-reference/document-apis/delete-by-query.md @@ -13,33 +13,24 @@ redirect_from: You can include a query as part of your delete request so OpenSearch deletes all documents that match that query. -## Example +## Path and HTTP methods ```json -POST sample-index1/_delete_by_query -{ - "query": { - "match": { - "movie-length": "124" - } - } -} +POST /_delete_by_query ``` -{% include copy-curl.html %} -## Path and HTTP methods +## Path parameters -``` -POST /_delete_by_query -``` +Parameter | Type | Description +:--- | :--- | :--- | :--- +<index> | String | Name or list of the data streams, indexes, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indexes. -## URL parameters +## Query parameters -All URL parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- | :--- -<index> | String | Name or list of the data streams, indexes, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indexes. allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. analyzer | String | The analyzer to use in the query string. analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. @@ -74,7 +65,7 @@ wait_for_active_shards | String | The number of shards that must be active befor wait_for_completion | Boolean | Setting this parameter to false indicates to OpenSearch it should not wait for completion and perform this request asynchronously. Asynchronous requests run in the background, and you can use the [Tasks]({{site.url}}{{site.baseurl}}/api-reference/tasks) API to monitor progress. -## Request body +## Request body fields To search your index for specific documents, you must include a [query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) in the request body that OpenSearch uses to match documents. If you don't use a query, OpenSearch treats your delete request as a simple [delete document operation]({{site.url}}{{site.baseurl}}/api-reference/document-apis/delete-document). @@ -88,6 +79,21 @@ To search your index for specific documents, you must include a [query]({{site.u } ``` +## Example request + +```json +POST sample-index1/_delete_by_query +{ + "query": { + "match": { + "movie-length": "124" + } + } +} +``` +{% include copy-curl.html %} + + ## Example response ```json { diff --git a/_api-reference/document-apis/delete-document.md b/_api-reference/document-apis/delete-document.md index ece99a28ca..85ce4bd79b 100644 --- a/_api-reference/document-apis/delete-document.md +++ b/_api-reference/document-apis/delete-document.md @@ -13,25 +13,23 @@ redirect_from: If you no longer need a document in your index, you can use the delete document API operation to delete it. -## Example - -``` -DELETE /sample-index1/_doc/1 -``` -{% include copy-curl.html %} - ## Path and HTTP methods -``` +```json DELETE //_doc/<_id> ``` -## URL parameters +## Path parameters Parameter | Type | Description | Required :--- | :--- | :--- | :--- <index> | String | The index to delete from. | Yes <_id> | String | The ID of the document to delete. | Yes + +## Query parameters + +Parameter | Type | Description | Required +:--- | :--- | :--- | :--- if_seq_no | Integer | Only perform the delete operation if the document's version number matches the specified number. | No if_primary_term | Integer | Only perform the delete operation if the document has the specified primary term. | No refresh | Enum | If true, OpenSearch refreshes shards to make the delete operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No @@ -41,6 +39,13 @@ version | Integer | The version of the document to delete, which must match the version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to delete version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the delete request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No +## Example request + +```json +DELETE /sample-index1/_doc/1 +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/document-apis/get-documents.md b/_api-reference/document-apis/get-documents.md index 232e9083c7..1a6bc73a12 100644 --- a/_api-reference/document-apis/get-documents.md +++ b/_api-reference/document-apis/get-documents.md @@ -30,6 +30,13 @@ GET /_source/<_id> HEAD /_source/<_id> ``` +## Path parameters + +Parameter | Type | Description | Required +:--- | :--- | :--- | :--- +<index> | String | The index to retrieve the document from. | Yes +<_id> | String | The ID of the document to retrieve. | Yes + ## Query parameters All query parameters are optional. @@ -151,5 +158,5 @@ _seq_no | The sequence number assigned when the document is indexed. primary_term | The primary term assigned when the document is indexed. found | Whether the document exists. _routing | The shard that the document is routed to. If the document is not routed to a particular shard, this field is omitted. -_source | Contains the document's data if `found` is true. If `_source` is set to false or `stored_fields` is set to true in the URL parameters, this field is omitted. +_source | Contains the document's data if `found` is true. If `_source` is set to false or `stored_fields` is set to true in the parameters, this field is omitted. _fields | Contains the document's data that's stored in the index. Only returned if both `stored_fields` and `found` are true. diff --git a/_api-reference/document-apis/index-document.md b/_api-reference/document-apis/index-document.md index a506e2d9d8..d195a0662e 100644 --- a/_api-reference/document-apis/index-document.md +++ b/_api-reference/document-apis/index-document.md @@ -13,19 +13,10 @@ redirect_from: You can use the `Index document` operation to add a single document to your index. -## Example - -```json -PUT sample-index/_doc/1 -{ - "Description": "To be or not to be, that is the question." -} -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json PUT /_doc/<_id> POST /_doc @@ -49,45 +40,20 @@ To test the Document APIs, add a document by following these steps: 3. In the **Management** section, choose **Dev Tools**. 4. Enter a command, and then select the green triangle play button to send the request. The following are some example commands. -### Create a sample-index -```json -PUT /sample-index -``` -{% include copy-curl.html %} - -### Example PUT request - -```json -PUT /sample_index/_doc/1 -{ - "name": "Example", - "price": 29.99, - "description": "To be or not to be, that is the question" -} -``` -{% include copy-curl.html %} - -### Example POST request -```json -POST /sample_index/_doc -{ - "name": "Another Example", - "price": 19.99, - "description": "We are such stuff as dreams are made on" -} +## Path parameters -``` -{% include copy-curl.html %} +Parameter | Type | Description | Required +:--- | :--- | :--- | :--- +<index> | String | Name of the index. | Yes +<id> | String | A unique identifier to attach to the document. To automatically generate an ID, use `POST /doc` in your request instead of PUT. | No -## URL parameters +## Query parameters -In your request, you must specify the index you want to add your document to. If the index doesn't already exist, OpenSearch automatically creates the index and adds in your document. All other URL parameters are optional. +In your request, you must specify the index you want to add your document to. If the index doesn't already exist, OpenSearch automatically creates the index and adds in your document. All other parameters are optional. Parameter | Type | Description | Required :--- | :--- | :--- | :--- -<index> | String | Name of the index. | Yes -<_id> | String | A unique identifier to attach to the document. To automatically generate an ID, use `POST /doc` in your request instead of PUT. | No if_seq_no | Integer | Only perform the index operation if the document has the specified sequence number. | No if_primary_term | Integer | Only perform the index operation if the document has the specified primary term.| No op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (index a document only if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No @@ -100,17 +66,38 @@ version_type | Enum | Assigns a specific type to the document. Valid options are wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No require_alias | Boolean | Specifies whether the target index must be an index alias. Default is `false`. | No -## Request body +## Example requests + +The following example requests create a sample index document for an index named `sample_index`: + + +### Example PUT request + +```json +PUT /sample_index/_doc/1 +{ + "name": "Example", + "price": 29.99, + "description": "To be or not to be, that is the question" +} +``` +{% include copy-curl.html %} -Your request body must contain the information you want to index. +### Example POST request ```json +POST /sample_index/_doc { - "Description": "This is just a sample document" + "name": "Another Example", + "price": 19.99, + "description": "We are such stuff as dreams are made on" } + ``` +{% include copy-curl.html %} ## Example response + ```json { "_index": "sample-index", diff --git a/_api-reference/document-apis/multi-get.md b/_api-reference/document-apis/multi-get.md index b267b8f3ac..acd69a7b7e 100644 --- a/_api-reference/document-apis/multi-get.md +++ b/_api-reference/document-apis/multi-get.md @@ -15,20 +15,25 @@ The multi-get operation allows you to run multiple GET operations in one request ## Path and HTTP methods -``` +```json GET _mget GET /_mget POST _mget POST /_mget ``` -## URL parameters - -All multi-get URL parameters are optional. +## Path parameters Parameter | Type | Description :--- | :--- | :--- | :--- <index> | String | Name of the index to retrieve documents from. + +## Query parameters + +All parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- | :--- preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is `random`. realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`. refresh | Boolean | If true, OpenSearch refreshes shards to make the multi-get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. @@ -147,5 +152,5 @@ _seq_no | The sequence number assigned when the document is indexed. primary_term | The primary term assigned when the document is indexed. found | Whether the document exists. _routing | The shard that the document is routed to. If the document is not routed to a particular shard, this field is omitted. -_source | Contains the document's data if `found` is true. If `_source` is set to false or `stored_fields` is set to true in the URL parameters, this field is omitted. +_source | Contains the document's data if `found` is true. If `_source` is set to false or `stored_fields` is set to true in the parameters, this field is omitted. _fields | Contains the document's data that's stored in the index. Only returned if both `stored_fields` and `found` are true. diff --git a/_api-reference/document-apis/reindex.md b/_api-reference/document-apis/reindex.md index 8ac1c48be4..65df81777e 100644 --- a/_api-reference/document-apis/reindex.md +++ b/_api-reference/document-apis/reindex.md @@ -14,30 +14,16 @@ redirect_from: The reindex document API operation lets you copy all or a subset of your data from a source index into a destination index. -## Example - -```json -POST /_reindex -{ - "source":{ - "index":"my-source-index" - }, - "dest":{ - "index":"my-destination-index" - } -} -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json POST /_reindex ``` -## URL parameters +## Query parameters -All URL parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- @@ -81,6 +67,21 @@ pipeline | Which ingest pipeline to utilize during the reindex. script | A script that OpenSearch uses to apply transformations to the data during the reindex operation. lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`. +## Example request + +```json +POST /_reindex +{ + "source":{ + "index":"my-source-index" + }, + "dest":{ + "index":"my-destination-index" + } +} +``` +{% include copy-curl.html %} + ## Example response ```json { diff --git a/_api-reference/document-apis/update-by-query.md b/_api-reference/document-apis/update-by-query.md index 09b3bd599f..64df8c901b 100644 --- a/_api-reference/document-apis/update-by-query.md +++ b/_api-reference/document-apis/update-by-query.md @@ -13,40 +13,25 @@ redirect_from: You can include a query and a script as part of your update request so OpenSearch can run the script to update all of the documents that match the query. -## Example + +## Path and HTTP methods ```json -POST test-index1/_update_by_query -{ - "query": { - "term": { - "oldValue": 10 - } - }, - "script" : { - "source": "ctx._source.oldValue += params.newValue", - "lang": "painless", - "params" : { - "newValue" : 20 - } - } -} +POST , /_update_by_query ``` -{% include copy-curl.html %} -## Path and HTTP methods +## Path parameters -``` -POST , /_update_by_query -``` +Parameter | Type | Description +:--- | :--- | :--- | :--- +<index> | String | Comma-separated list of indexes to update. To update all indexes, use * or omit this parameter. -## URL parameters +## Query parameters -All URL parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- | :--- -<index> | String | Comma-separated list of indexes to update. To update all indexes, use * or omit this parameter. allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. analyzer | String | Analyzer to use in the query string. analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is `false`. @@ -81,7 +66,7 @@ version | Boolean | Whether to include the document version as a match. wait_for_active_shards | String | The number of shards that must be active before OpenSearch executes the operation. Valid values are `all` or any integer up to the total number of shards in the index. Default is 1, which is the primary shard. wait_for_completion | boolean | When set to `false`, the response body includes a task ID and OpenSearch executes the operation asynchronously. The task ID can be used to check the status of the task or to cancel the task. Default is set to `true`. -## Request body +## Request body options To update your indexes and documents by query, you must include a [query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) and a script in the request body that OpenSearch can run to update your documents. If you don't specify a query, then every document in the index gets updated. @@ -102,6 +87,27 @@ To update your indexes and documents by query, you must include a [query]({{site } ``` +## Example requests + +```json +POST test-index1/_update_by_query +{ + "query": { + "term": { + "oldValue": 10 + } + }, + "script" : { + "source": "ctx._source.oldValue += params.newValue", + "lang": "painless", + "params" : { + "newValue" : 20 + } + } +} +``` +{% include copy-curl.html %} + ## Example response ```json { diff --git a/_api-reference/document-apis/update-document.md b/_api-reference/document-apis/update-document.md index 3f951b5adf..ff17940cdb 100644 --- a/_api-reference/document-apis/update-document.md +++ b/_api-reference/document-apis/update-document.md @@ -11,45 +11,26 @@ redirect_from: **Introduced 1.0** {: .label .label-purple } -If you need to update a document's fields in your index, you can use the update document API operation. You can do so by specifying the new data you want to be in your index or by including a script in your request body, which OpenSearch runs to update the document. By default, the update operation only updates a document that exists in the index. If a document does not exist, the API returns an error. To _upsert_ a document (update the document that exists or index a new one), use the [upsert](#upsert) operation. +If you need to update a document's fields in your index, you can use the update document API operation. You can do so by specifying the new data you want to be in your index or by including a script in your request body, which OpenSearch runs to update the document. By default, the update operation only updates a document that exists in the index. If a document does not exist, the API returns an error. To _upsert_ a document (update the document that exists or index a new one), use the [upsert](#using-the-upsert-operation) operation. -## Example - -```json -POST /sample-index1/_update/1 -{ - "doc": { - "first_name" : "Bruce", - "last_name" : "Wayne" - } -} -``` -{% include copy-curl.html %} - -## Script example - -```json -POST /test-index1/_update/1 -{ - "script" : { - "source": "ctx._source.secret_identity = \"Batman\"" - } -} -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json POST //_update/<_id> ``` -## URL parameters +## Path parameters Parameter | Type | Description | Required :--- | :--- | :--- | :--- <index> | String | Name of the index. | Yes <_id> | String | The ID of the document to update. | Yes + +## Query parameters + +Parameter | Type | Description | Required +:--- | :--- | :--- | :--- if_seq_no | Integer | Only perform the update operation if the document has the specified sequence number. | No if_primary_term | Integer | Perform the update operation if the document has the specified primary term. | No lang | String | Language of the script. Default is `painless`. | No @@ -63,7 +44,7 @@ _source_includes | List | A comma-separated list of source fields to include in timeout | Time | How long to wait for a response from the cluster. | No wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the update request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No -## Request body +## Request body fields Your request body must contain the information with which you want to update your document. If you only want to replace certain fields in your document, your request body must include a `doc` object containing the fields that you want to update: @@ -90,7 +71,34 @@ You can also use a script to tell OpenSearch how to update your document: } ``` -## Upsert +## Example requests + +### Update a document + +```json +POST /sample-index1/_update/1 +{ + "doc": { + "first_name" : "Bruce", + "last_name" : "Wayne" + } +} +``` +{% include copy-curl.html %} + +### Update a document with a script + +```json +POST /test-index1/_update/1 +{ + "script" : { + "source": "ctx._source.secret_identity = \"Batman\"" + } +} +``` +{% include copy-curl.html %} + +### Using the upsert operation Upsert is an operation that conditionally either updates an existing document or inserts a new one based on information in the object. @@ -109,6 +117,7 @@ POST /sample-index1/_update/1 } } ``` +{% include copy-curl.html %} Consider an index that contains the following document: @@ -123,6 +132,7 @@ Consider an index that contains the following document: } } ``` +{% include copy-curl.html %} After the upsert operation, the document's `first_name` and `last_name` fields are updated: @@ -137,6 +147,7 @@ After the upsert operation, the document's `first_name` and `last_name` fields a } } ``` +{% include copy-curl.html %} If the document does not exist in the index, a new document is indexed with the fields specified in the `upsert` object: @@ -151,6 +162,7 @@ If the document does not exist in the index, a new document is indexed with the } } ``` +{% include copy-curl.html %} You can also add `doc_as_upsert` to the request and set it to `true` to use the information in the `doc` field for performing the upsert operation: @@ -165,6 +177,7 @@ POST /sample-index1/_update/1 "doc_as_upsert": true } ``` +{% include copy-curl.html %} Consider an index that contains the following document: @@ -179,6 +192,7 @@ Consider an index that contains the following document: } } ``` +{% include copy-curl.html %} After the upsert operation, the document's `first_name` and `last_name` fields are updated and an `age` field is added. If the document does not exist in the index, a new document is indexed with the fields specified in the `upsert` object. In both cases, the document is as follows: @@ -194,8 +208,10 @@ After the upsert operation, the document's `first_name` and `last_name` fields a } } ``` +{% include copy-curl.html %} ## Example response + ```json { "_index": "sample-index1", diff --git a/_api-reference/explain.md b/_api-reference/explain.md index 8c2b757945..0591c5bb52 100644 --- a/_api-reference/explain.md +++ b/_api-reference/explain.md @@ -18,7 +18,40 @@ The explain API is an expensive operation in terms of both resources and time. O {: .warning } -## Example +## Path and HTTP methods + +```json +GET /_explain/ +POST /_explain/ +``` + +## Path parameters + +Parameter | Type | Description | Required +:--- | :--- | :--- | :--- +`` | String | Name of the index. You can only specify a single index. | Yes +`` | String | A unique identifier to attach to the document. | Yes + +## Query parameters + +You must specify the index and document ID. All other parameters are optional. + +Parameter | Type | Description | Required +:--- | :--- | :--- | :--- +`analyzer` | String | The analyzer to use in the query string. | No +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. | No +`default_operator` | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. | No +`df` | String | The default field in case a field prefix is not provided in the query string. | No +`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, querying a text field for an integer). Default is `false`. | No +`preference` | String | Specifies a preference of which shard to retrieve results from. Available options are `_local`, which tells the operation to retrieve results from a locally allocated shard replica, and a custom string value assigned to a specific shard replica. By default, OpenSearch executes the explain operation on random shards. | No +`q` | String | Query in the Lucene query string syntax. | No +`stored_fields` | Boolean | If true, the operation retrieves document fields stored in the index rather than the document’s `_source`. Default is `false`. | No +`routing` | String | Value used to route the operation to a specific shard. | No +`_source` | String | Whether to include the `_source` field in the response body. Default is `true`. | No +`_source_excludes` | String | A comma-separated list of source fields to exclude in the query response. | No +`_source_includes` | String | A comma-separated list of source fields to include in the query response. | No + +## Example requests To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request: @@ -48,35 +81,7 @@ POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte ``` {% include copy-curl.html %} -## Path and HTTP methods - -``` -GET /_explain/ -POST /_explain/ -``` - -## URL parameters - -You must specify the index and document ID. All other URL parameters are optional. - -Parameter | Type | Description | Required -:--- | :--- | :--- | :--- -`` | String | Name of the index. You can only specify a single index. | Yes -`<_id>` | String | A unique identifier to attach to the document. | Yes -`analyzer` | String | The analyzer to use in the query string. | No -`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. | No -`default_operator` | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. | No -`df` | String | The default field in case a field prefix is not provided in the query string. | No -`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, querying a text field for an integer). Default is `false`. | No -`preference` | String | Specifies a preference of which shard to retrieve results from. Available options are `_local`, which tells the operation to retrieve results from a locally allocated shard replica, and a custom string value assigned to a specific shard replica. By default, OpenSearch executes the explain operation on random shards. | No -`q` | String | Query in the Lucene query string syntax. | No -`stored_fields` | Boolean | If true, the operation retrieves document fields stored in the index rather than the document’s `_source`. Default is `false`. | No -`routing` | String | Value used to route the operation to a specific shard. | No -`_source` | String | Whether to include the `_source` field in the response body. Default is `true`. | No -`_source_excludes` | String | A comma-separated list of source fields to exclude in the query response. | No -`_source_includes` | String | A comma-separated list of source fields to include in the query response. | No - -## Response +## Example response ```json { diff --git a/_api-reference/index-apis/alias.md b/_api-reference/index-apis/alias.md index ebd7bdedfd..c3ddf76911 100644 --- a/_api-reference/index-apis/alias.md +++ b/_api-reference/index-apis/alias.md @@ -15,45 +15,22 @@ redirect_from: An alias is a virtual pointer that you can use to reference one or more indexes. Creating and updating aliases are atomic operations, so you can reindex your data and point an alias at it without any downtime. -## Example - -```json -POST _aliases -{ - "actions": [ - { - "add": { - "index": "movies", - "alias": "movies-alias1" - } - }, - { - "remove": { - "index": "old-index", - "alias": "old-index-alias" - } - } - ] -} -``` -{% include copy-curl.html %} - ## Path and HTTP methods -``` +```json POST _aliases ``` -## URL parameters +## Query parameters -All alias parameters are optional. +All parameters are optional. Parameter | Data Type | Description :--- | :--- | :--- cluster_manager_timeout | Time | The amount of time to wait for a response from the cluster manager node. Default is `30s`. timeout | Time | The amount of time to wait for a response from the cluster. Default is `30s`. -## Request body +## Request body fields In your request body, you need to specify what action to take, the alias name, and the index you want to associate with the alias. Other fields are optional. @@ -75,6 +52,29 @@ routing | String | Used to assign a custom value to a shard for specific operati index_routing | String | Assigns a custom value to a shard only for index operations. | No search_routing | String | Assigns a custom value to a shard only for search operations. | No +## Example request + +```json +POST _aliases +{ + "actions": [ + { + "add": { + "index": "movies", + "alias": "movies-alias1" + } + }, + { + "remove": { + "index": "old-index", + "alias": "old-index-alias" + } + } + ] +} +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/index-apis/clear-index-cache.md b/_api-reference/index-apis/clear-index-cache.md index 9bf873301d..6227a29960 100644 --- a/_api-reference/index-apis/clear-index-cache.md +++ b/_api-reference/index-apis/clear-index-cache.md @@ -15,6 +15,12 @@ The clear cache API operation clears the caches of one or more indexes. For data If you use the Security plugin, you must have the `manage index` privileges. {: .note} +## Path and HTTP methods + +```json +POST //_cache/clear +``` + ## Path parameters | Parameter | Data type | Description | @@ -117,7 +123,7 @@ The `POST /books,hockey/_cache/clear` request returns the following fields: } ``` -## Response fields +## Response body fields The `POST /books,hockey/_cache/clear` request returns the following response fields: diff --git a/_api-reference/index-apis/clone.md b/_api-reference/index-apis/clone.md index c1496cbaf8..36592a28b5 100644 --- a/_api-reference/index-apis/clone.md +++ b/_api-reference/index-apis/clone.md @@ -13,27 +13,9 @@ redirect_from: The clone index API operation clones all data in an existing read-only index into a new index. The new index cannot already exist. -## Example - -```json -PUT /sample-index1/_clone/cloned-index1 -{ - "settings": { - "index": { - "number_of_shards": 2, - "number_of_replicas": 1 - } - }, - "aliases": { - "sample-alias1": {} - } -} -``` -{% include copy-curl.html %} - ## Path and HTTP methods -``` +```json POST //_clone/ PUT //_clone/ ``` @@ -48,14 +30,19 @@ OpenSearch indexes have the following naming restrictions: `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<` -## URL parameters - -Your request must include the source and target indexes. All other clone index parameters are optional. +## Path parameter Parameter | Type | Description :--- | :--- | :--- <source-index> | String | The source index to clone. <target-index> | String | The index to create and add cloned data to. + +## Query parameters + +Your request must include the source and target indexes. All other clone index parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for the request to return. Default is `30s`. diff --git a/_api-reference/index-apis/close-index.md b/_api-reference/index-apis/close-index.md index 865d17d90a..ecad7d18cc 100644 --- a/_api-reference/index-apis/close-index.md +++ b/_api-reference/index-apis/close-index.md @@ -13,26 +13,25 @@ redirect_from: The close index API operation closes an index. Once an index is closed, you cannot add data to it or search for any data within the index. -#### Example + +## Path and HTTP methods ```json -POST /sample-index/_close +POST //_close ``` -{% include copy-curl.html %} -## Path and HTTP methods +## Path parameters -``` -POST //_close -``` +Parameter | Type | Description +:--- | :--- | :--- +<index> | String | The index to close. Can be a comma-separated list of multiple index names. Use `_all` or * to close all indexes. -## URL parameters +## Query parameters All parameters are optional. Parameter | Type | Description :--- | :--- | :--- -<index-name> | String | The index to close. Can be a comma-separated list of multiple index names. Use `_all` or * to close all indexes. allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is `open`. ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is `false`. @@ -40,6 +39,13 @@ wait_for_active_shards | String | Specifies the number of active shards that mus cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for a response from the cluster. Default is `30s`. +## Example requests + +```json +POST /sample-index/_close +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/index-apis/component-template.md b/_api-reference/index-apis/component-template.md index bafdfa95c7..fa73e64c94 100644 --- a/_api-reference/index-apis/component-template.md +++ b/_api-reference/index-apis/component-template.md @@ -40,7 +40,7 @@ Parameter | Data type | Description `cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. `timeout` | Time | The amount of time for the operation to wait for a response. Default is `30s`. -## Request fields +## Request body fields The following options can be used in the request body to customize the index template. diff --git a/_api-reference/index-apis/create-index-template.md b/_api-reference/index-apis/create-index-template.md index 2a92e3f4c4..c2f4228c8e 100644 --- a/_api-reference/index-apis/create-index-template.md +++ b/_api-reference/index-apis/create-index-template.md @@ -31,7 +31,7 @@ Parameter | Data type | Description `create` | Boolean | When true, the API cannot replace or update any existing index templates. Default is `false`. `cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. -## Request body options +## Request body fields The following options can be used in the request body to customize the index template. @@ -45,7 +45,7 @@ Parameter | Type | Description `priority` | Integer | A number that determines which index templates take precedence during the creation of a new index or data stream. OpenSearch chooses the template with the highest priority. When no priority is given, the template is assigned a `0`, signifying the lowest priority. Optional. `template` | Object | The template that includes the `aliases`, `mappings`, or `settings` for the index. For more information, see [#template]. Optional. `version` | Integer | The version number used to manage index templates. Version numbers are not automatically set by OpenSearch. Optional. - +`context` | Object | (Experimental) The `context` parameter provides use-case-specific predefined templates that can be applied to an index. Among all settings and mappings declared for a template, context templates hold the highest priority. For more information, see [index-context]({{site.url}}{{site.baseurl}}/im-plugin/index-context/). ### Template diff --git a/_api-reference/index-apis/create-index.md b/_api-reference/index-apis/create-index.md index 2f4c1041bc..f10450bb28 100644 --- a/_api-reference/index-apis/create-index.md +++ b/_api-reference/index-apis/create-index.md @@ -18,8 +18,8 @@ When creating an index, you can specify its mappings, settings, and aliases. ## Path and HTTP methods -``` -PUT +```json +PUT ``` ## Index naming restrictions @@ -50,7 +50,7 @@ timeout | Time | How long to wait for the request to return. Default is `30s`. ## Request body -As part of your request, you can optionally specify [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/), [mappings]({{site.url}}{{site.baseurl}}/field-types/index/), and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) for your newly created index. +As part of your request, you can optionally specify [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/), [mappings]({{site.url}}{{site.baseurl}}/field-types/index/), [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/), and [index context]({{site.url}}{{site.baseurl}}/opensearch/index-context/). ## Example request diff --git a/_api-reference/index-apis/dangling-index.md b/_api-reference/index-apis/dangling-index.md index 9d40687f9f..f44a9dc4d4 100644 --- a/_api-reference/index-apis/dangling-index.md +++ b/_api-reference/index-apis/dangling-index.md @@ -15,21 +15,24 @@ After a node joins a cluster, dangling indexes occur if any shards exist in the List dangling indexes: -``` +```json GET /_dangling ``` +{% include copy-curl.html %} Import a dangling index: -``` +```json POST /_dangling/ ``` +{% include copy-curl.html %} Delete a dangling index: -``` +```json DELETE /_dangling/ ``` +{% include copy-curl.html %} ## Path parameters @@ -49,31 +52,29 @@ accept_data_loss | Boolean | Must be set to `true` for an `import` or `delete` b timeout | Time units | The amount of time to wait for a response. If no response is received in the defined time period, an error is returned. Default is `30` seconds. cluster_manager_timeout | Time units | The amount of time to wait for a connection to the cluster manager. If no response is received in the defined time period, an error is returned. Default is `30` seconds. -## Examples - -The following are example requests and a example response. +## Example requests -#### Sample list +### Sample list ````bash GET /_dangling ```` {% include copy-curl.html %} -#### Sample import +### Sample import ````bash POST /_dangling/msdjernajxAT23RT-BupMB?accept_data_loss=true ```` {% include copy-curl.html %} -#### Sample delete +### Sample delete ````bash DELETE /_dangling/msdjernajxAT23RT-BupMB?accept_data_loss=true ```` -#### Example response body +## Example response ````json { diff --git a/_api-reference/index-apis/delete-index.md b/_api-reference/index-apis/delete-index.md index ad00eb7eca..af0bd292fc 100644 --- a/_api-reference/index-apis/delete-index.md +++ b/_api-reference/index-apis/delete-index.md @@ -13,19 +13,13 @@ redirect_from: If you no longer need an index, you can use the delete index API operation to delete it. -## Example +## Path and HTTP methods ```json -DELETE /sample-index -``` -{% include copy-curl.html %} - -## Path and HTTP methods -``` DELETE / ``` -## URL parameters +## Query parameters All parameters are optional. @@ -37,6 +31,13 @@ ignore_unavailable | Boolean | If true, OpenSearch does not include missing or c cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for the response to return. Default is `30s`. +## Example request + +```json +DELETE /sample-index +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/index-apis/exists.md b/_api-reference/index-apis/exists.md index 351e2f2088..fb1a4d79c6 100644 --- a/_api-reference/index-apis/exists.md +++ b/_api-reference/index-apis/exists.md @@ -13,20 +13,14 @@ redirect_from: The index exists API operation returns whether or not an index already exists. -## Example - -```json -HEAD /sample-index -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json HEAD / ``` -## URL parameters +## Query parameters All parameters are optional. @@ -40,6 +34,13 @@ ignore_unavailable | Boolean | If true, OpenSearch does not search for missing o local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is `false`. +## Example request + +```json +HEAD /sample-index +``` +{% include copy-curl.html %} + ## Example response The index exists API operation returns only one of two possible response codes: `200` -- the index exists, and `404` -- the index does not exist. diff --git a/_api-reference/index-apis/flush.md b/_api-reference/index-apis/flush.md index fb97e43900..e464a42cad 100644 --- a/_api-reference/index-apis/flush.md +++ b/_api-reference/index-apis/flush.md @@ -18,7 +18,7 @@ OpenSearch automatically performs flushes in the background based on conditions The Flush API supports the following paths: -``` +```json GET /_flush POST /_flush GET /{index}/_flush @@ -35,7 +35,7 @@ The following table lists the available path parameters. All path parameters are ## Query parameters -The Flush API supports the following query parameters. +All parameters are optional. | Parameter | Data type | Description | | :--- | :--- | :--- | @@ -45,21 +45,26 @@ The Flush API supports the following query parameters. | `ignore_unavailable` | Boolean | When `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`. | | `wait_if_ongoing` | Boolean | When `true`, the Flush API does not run while another flush request is active. When `false`, OpenSearch returns an error if another flush request is active. Default is `true`. | -## Example request: Flush a specific index +## Example requests + +### Flush a specific index The following example flushes an index named `shakespeare`: -``` +```json POST /shakespeare/_flush ``` +{% include copy-curl.html %} -## Example request: Flush all indexes + +### Flush all indexes The following example flushes all indexes in a cluster: -``` +```json POST /_flush ``` +{% include copy-curl.html %} ## Example response diff --git a/_api-reference/index-apis/force-merge.md b/_api-reference/index-apis/force-merge.md index ce7501ebe3..8316c72937 100644 --- a/_api-reference/index-apis/force-merge.md +++ b/_api-reference/index-apis/force-merge.md @@ -11,6 +11,13 @@ nav_order: 37 The force merge API operation forces a merge on the shards of one or more indexes. For a data stream, the API forces a merge on the shards of the stream's backing index. +## Path and HTTP methods + +```json +POST /_forcemerge +POST //_forcemerge/ +``` + ## The merge operation In OpenSearch, a shard is a Lucene index, which consists of _segments_ (or segment files). Segments store the indexed data. Periodically, smaller segments are merged into larger ones and the larger segments become immutable. Merging reduces the overall number of segments on each shard and frees up disk space. @@ -45,12 +52,6 @@ When you force merge multiple indexes, the merge operation is executed on each s It can be useful to force merge data streams in order to manage a data stream's backing indexes, especially after a rollover operation. Time-based indexes receive indexing requests only during a specified time period. Once that time period has elapsed and the index receives no more write requests, you can force merge segments of all index shards into one segment. Searches on single-segment shards are more efficient because they use simpler data structures. -## Path and HTTP methods - -```json -POST /_forcemerge -POST //_forcemerge/ -``` ## Path parameters @@ -135,7 +136,7 @@ POST /.testindex-logs/_forcemerge?primary_only=true } ``` -## Response fields +## Response body fields The following table lists all response fields. diff --git a/_api-reference/index-apis/get-index.md b/_api-reference/index-apis/get-index.md index e2d2d85c65..78fe7bcd94 100644 --- a/_api-reference/index-apis/get-index.md +++ b/_api-reference/index-apis/get-index.md @@ -13,20 +13,24 @@ redirect_from: You can use the get index API operation to return information about an index. -## Example + +## Path and HTTP methods ```json -GET /sample-index +GET / ``` -{% include copy-curl.html %} -## Path and HTTP methods +## Path parameters -``` -GET / -``` +## Path parameters -## URL parameters +The following table lists the available path parameters. All path parameters are optional. + +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `` | String | A comma-separated list of indexes, data streams, or index aliases to which the operation is applied. Supports wildcard expressions (`*`). Use `_all` or `*` to specify all indexes and data streams in a cluster. | + +## Query parameters All parameters are optional. @@ -40,6 +44,12 @@ ignore_unavailable | Boolean | If true, OpenSearch does not include missing or c local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +## Example request + +```json +GET /sample-index +``` +{% include copy-curl.html %} ## Example response ```json diff --git a/_api-reference/index-apis/get-settings.md b/_api-reference/index-apis/get-settings.md index 94cb4a7c6c..4eebb43272 100644 --- a/_api-reference/index-apis/get-settings.md +++ b/_api-reference/index-apis/get-settings.md @@ -14,29 +14,28 @@ redirect_from: The get settings API operation returns all the settings in your index. -## Example - -```json -GET /sample-index1/_settings -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json GET /_settings GET //_settings GET //_settings/ ``` -## URL parameters - -All get settings parameters are optional. +## Path parameters Parameter | Data type | Description :--- | :--- | :--- <target-index> | String | The index to get settings from. Can be a comma-separated list to get settings from multiple indexes, or use `_all` to return settings from all indexes within the cluster. <setting> | String | Filter to return specific settings. + +## Query parameters + +All get settings parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`. flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of “index”: { “creation_date”: “123456789” } is “index.creation_date”: “123456789”. @@ -45,6 +44,13 @@ ignore_unavailable | Boolean | If true, OpenSearch does not include missing or c local | Boolean | Whether to return information from the local node only instead of the cluster manager node. Default is `false`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +## Example request + +```json +GET /sample-index1/_settings +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/index-apis/open-index.md b/_api-reference/index-apis/open-index.md index 0d8ef62282..3011507697 100644 --- a/_api-reference/index-apis/open-index.md +++ b/_api-reference/index-apis/open-index.md @@ -13,26 +13,25 @@ redirect_from: The open index API operation opens a closed index, letting you add or search for data within the index. -## Example + +## Path and HTTP methods ```json -POST /sample-index/_open +POST //_open ``` -{% include copy-curl.html %} -## Path and HTTP methods +## Path parameters -``` -POST //_open -``` +Parameter | Type | Description +:--- | :--- | :--- +<index> | String | The index to open. Can be a comma-separated list of multiple index names. Use `_all` or * to open all indexes. -## URL parameters +## Query parameters All parameters are optional. Parameter | Type | Description :--- | :--- | :--- -<index-name> | String | The index to open. Can be a comma-separated list of multiple index names. Use `_all` or * to open all indexes. allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is `open`. ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is `false`. @@ -42,6 +41,13 @@ timeout | Time | How long to wait for a response from the cluster. Default is `3 wait_for_completion | Boolean | When set to `false`, the request returns immediately instead of after the operation is finished. To monitor the operation status, use the [Tasks API]({{site.url}}{{site.baseurl}}/api-reference/tasks/) with the task ID returned by the request. Default is `true`. task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. +## Example request + +```json +POST /sample-index/_open +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/index-apis/put-mapping.md b/_api-reference/index-apis/put-mapping.md index f7d9321d33..26bfbae0d9 100644 --- a/_api-reference/index-apis/put-mapping.md +++ b/_api-reference/index-apis/put-mapping.md @@ -17,17 +17,45 @@ If you want to create or add mappings and fields to an index, you can use the pu You can't use this operation to update mappings that already map to existing data in the index. You must first create a new index with your desired mappings, and then use the [reindex API operation]({{site.url}}{{site.baseurl}}/opensearch/reindex-data) to map all the documents from your old index to the new index. If you don't want any downtime while you re-index your indexes, you can use [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias). +## Path and HTTP methods -## Required path parameter +```json +PUT //_mapping +PUT /,/_mapping +``` + + +## Path parameters The only required path parameter is the index with which to associate the mapping. If you don't specify an index, you will get an error. You can specify a single index, or multiple indexes separated by a comma as follows: -``` +```json PUT //_mapping PUT /,/_mapping ``` -## Required request body field +## Query parameters + +Optionally, you can add query parameters to make a more specific request. For example, to skip any missing or closed indexes in the response, you can add the `ignore_unavailable` query parameter to your request as follows: + +```json +PUT /sample-index/_mapping?ignore_unavailable +``` + +The following table defines the put mapping query parameters: + +Parameter | Data type | Description +:--- | :--- | :--- +allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. +expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`. +ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. +cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +timeout | Time | How long to wait for the response to return. Default is `30s`. +write_index_only | Boolean | Whether OpenSearch should apply mapping updates only to the write index. + +## Request body fields + +### properties The request body must contain `properties`, which has all of the mappings that you want to create or update. @@ -44,8 +72,6 @@ The request body must contain `properties`, which has all of the mappings that y } ``` -## Optional request body fields - ### dynamic You can make the document structure match the structure of the index mapping by setting the `dynamic` request body field to `strict`, as seen in the following example: @@ -61,26 +87,8 @@ You can make the document structure match the structure of the index mapping by } ``` -## Optional query parameters - -Optionally, you can add query parameters to make a more specific request. For example, to skip any missing or closed indexes in the response, you can add the `ignore_unavailable` query parameter to your request as follows: - -```json -PUT /sample-index/_mapping?ignore_unavailable -``` - -The following table defines the put mapping query parameters: - -Parameter | Data type | Description -:--- | :--- | :--- -allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`. -ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. -timeout | Time | How long to wait for the response to return. Default is `30s`. -write_index_only | Boolean | Whether OpenSearch should apply mapping updates only to the write index. -#### Sample Request +## Example request The following request creates a new mapping for the `sample-index` index: @@ -100,7 +108,7 @@ PUT /sample-index/_mapping ``` {% include copy-curl.html %} -#### Sample Response +## Example response Upon success, the response returns `"acknowledged": true`. diff --git a/_api-reference/index-apis/recover.md b/_api-reference/index-apis/recover.md index dc2df1e5a2..41f071cf6c 100644 --- a/_api-reference/index-apis/recover.md +++ b/_api-reference/index-apis/recover.md @@ -28,14 +28,14 @@ The Recovery API reports solely on completed recoveries for shard copies present ```json GET /_recovery -GET //recovery/ +GET //_recovery/ ``` ## Path parameters Parameter | Data type | Description :--- | :--- -`index-name` | String | A comma-separated list of indexes, data streams, or index aliases to which the operation is applied. Supports wildcard expressions (`*`). Use `_all` or `*` to specify all indexes and data streams in a cluster. | +`index` | String | A comma-separated list of indexes, data streams, or index aliases to which the operation is applied. Supports wildcard expressions (`*`). Use `_all` or `*` to specify all indexes and data streams in a cluster. | ## Query parameters @@ -48,24 +48,6 @@ Parameter | Data type | Description `detailed` | Boolean | When `true`, provides detailed information about shard recoveries. Default is `false`. `index` | String | A comma-separated list or wildcard expression of index names used to limit the request. -## Response fields - -The API responds with the following information about the recovery shard. - -Parameter | Data type | Description -:--- | :--- | :--- -`id` | Integer | The ID of the shard. -`type` | String | The recovery source for the shard. Returned values include:
- `EMPTY_STORE`: An empty store. Indicates a new primary shard or the forced allocation of an empty primary shard using the Cluster Reroute API.
- `EXISTING_STORE`: The store of an existing primary shard. Indicates that the recovery is related to node startup or the allocation of an existing primary shard.
- `LOCAL_SHARDS`: Shards belonging to another index on the same node. Indicates that the recovery is related to a clone, shrink, or split operation.
- `PEER`: A primary shard on another node. Indicates that the recovery is related to shard replication.
- `SNAPSHOT`: A snapshot. Indicates that the recovery is related to a snapshot restore operation. -`STAGE` | String | The recovery stage. Returned values can include:
- `INIT`: Recovery has not started.
- `INDEX`: Reading index metadata and copying bytes from the source to the destination.
- `VERIFY_INDEX`: Verifying the integrity of the index.
- `TRANSLOG`: Replaying the transaction log.
- `FINALIZE`: Cleanup.
- `DONE`: Complete. -`primary` | Boolean | When `true`, the shard is a primary shard. -`start_time` | String | The timestamp indicating when the recovery started. -`stop_time` | String | The timestamp indicating when the recovery completed. -`total_time_in_millis` | String | The total amount of time taken to recover a shard, in milliseconds. -`source` | Object | The recovery source. This can include a description of the repository (if the recovery is from a snapshot) or a description of the source node. -`target` | Object | The destination node. -`index` | Object | Statistics about the physical index recovery. -`translog` | Object | Statistics about the translog recovery. - `start` | Object | Statistics about the amount of time taken to open and start the index. ## Example requests @@ -289,3 +271,22 @@ The following response returns detailed recovery information about an index name } } ``` + +## Response body fields + +The API responds with the following information about the recovery shard. + +Parameter | Data type | Description +:--- | :--- | :--- +`id` | Integer | The ID of the shard. +`type` | String | The recovery source for the shard. Returned values include:
- `EMPTY_STORE`: An empty store. Indicates a new primary shard or the forced allocation of an empty primary shard using the Cluster Reroute API.
- `EXISTING_STORE`: The store of an existing primary shard. Indicates that the recovery is related to node startup or the allocation of an existing primary shard.
- `LOCAL_SHARDS`: Shards belonging to another index on the same node. Indicates that the recovery is related to a clone, shrink, or split operation.
- `PEER`: A primary shard on another node. Indicates that the recovery is related to shard replication.
- `SNAPSHOT`: A snapshot. Indicates that the recovery is related to a snapshot restore operation. +`STAGE` | String | The recovery stage. Returned values can include:
- `INIT`: Recovery has not started.
- `INDEX`: Reading index metadata and copying bytes from the source to the destination.
- `VERIFY_INDEX`: Verifying the integrity of the index.
- `TRANSLOG`: Replaying the transaction log.
- `FINALIZE`: Cleanup.
- `DONE`: Complete. +`primary` | Boolean | When `true`, the shard is a primary shard. +`start_time` | String | The timestamp indicating when the recovery started. +`stop_time` | String | The timestamp indicating when the recovery completed. +`total_time_in_millis` | String | The total amount of time taken to recover a shard, in milliseconds. +`source` | Object | The recovery source. This can include a description of the repository (if the recovery is from a snapshot) or a description of the source node. +`target` | Object | The destination node. +`index` | Object | Statistics about the physical index recovery. +`translog` | Object | Statistics about the translog recovery. + `start` | Object | Statistics about the amount of time taken to open and start the index. diff --git a/_api-reference/index-apis/refresh.md b/_api-reference/index-apis/refresh.md index 4d75060087..917ca5d9a9 100644 --- a/_api-reference/index-apis/refresh.md +++ b/_api-reference/index-apis/refresh.md @@ -48,22 +48,24 @@ The following table lists the available query parameters. All query parameters a | `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. +## Example requests -#### Example: Refresh several data streams or indexes +### Refresh several data streams or indexes The following example request refreshes two indexes named `my-index-A` and `my-index-B`: -``` +```json POST /my-index-A,my-index-B/_refresh ``` {% include copy-curl.html %} -#### Example: Refresh all data streams and indexes in a cluster +### Refresh all data streams and indexes in a cluster The following request refreshes all data streams and indexes in a cluster: -``` +```json POST /_refresh ``` +{% include copy-curl.html %} diff --git a/_api-reference/index-apis/rollover.md b/_api-reference/index-apis/rollover.md index 722dfe196c..db30a5d7bf 100644 --- a/_api-reference/index-apis/rollover.md +++ b/_api-reference/index-apis/rollover.md @@ -61,9 +61,9 @@ Parameter | Type | Description `timeout` | Time | The amount of time to wait for a response. Default is `30s`. `wait_for_active_shards` | String | The number of active shards that must be available before OpenSearch processes the request. Default is `1` (only the primary shard). You can also set to `all` or a positive integer. Values greater than `1` require replicas. For example, if you specify a value of `3`, then the index must have two replicas distributed across two additional nodes in order for the operation to succeed. -## Request body +## Request body fields -The following request body parameters are supported. +The following request body fields are supported. ### `alias` diff --git a/_api-reference/index-apis/segment.md b/_api-reference/index-apis/segment.md index 0ecee63e77..b9625d3b34 100644 --- a/_api-reference/index-apis/segment.md +++ b/_api-reference/index-apis/segment.md @@ -15,7 +15,7 @@ The Segment API provides details about the Lucene segments within index shards a ## Path and HTTP methods ```json -GET //_segments +GET //_segments GET /_segments ``` @@ -29,7 +29,7 @@ Parameter | Data type | Description ## Query parameters -The Segment API supports the following optional query parameters. +All query parameters are optional. Parameter | Data type | Description :--- | :--- | :--- @@ -38,21 +38,6 @@ Parameter | Data type | Description `ignore_unavailable` | Boolean | When `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`. `verbose` | Boolean | When `true`, provides information about Lucene's memory usage. Default is `false`. -## Response body fields - -Parameter | Data type | Description - :--- | :--- | :--- -`` | String | The name of the segment used to create internal file names in the shard directory. -`generation` | Integer | The generation number, such as `0`, incremented for each written segment and used to name the segment. -`num_docs` | Integer | The number of documents, obtained from Lucene. Nested documents are counted separately from their parents. Deleted documents, as well as recently indexed documents that are not yet assigned to a segment, are excluded. -`deleted_docs` | Integer | The number of deleted documents, obtained from Lucene, which may not match the actual number of delete operations performed. Recently deleted documents that are not yet assigned to a segment are excluded. Deleted documents are automatically merged when appropriate. OpenSearch will occasionally delete extra documents in order to track recent shard operations. -`size_in_bytes` | Integer | The amount of disk space used by the segment, for example, `50kb`. -`memory_in_bytes` | Integer | The amount of segment data, measured in bytes, that is kept in memory to facilitate efficient search operations, such as `1264`. A value of `-1` indicates that OpenSearch was unable to compute this number. -`committed` | Boolean | When `true`, the segments are synced to disk. Segments synced to disk can survive a hard reboot. If `false`, then uncommitted segment data is stored in the transaction log as well so that changes can be replayed at the next startup. -`search` | Boolean | When `true`, segment search is enabled. When `false`, the segment may have already been written to disk and require a refresh in order to be searchable. -`version` | String | The Lucene version used to write the segment. -`compound` | Boolean | When `true`, indicates that Lucene merged all segment files into one file in order to save any file descriptions. -`attributes` | Object | Shows if high compression was enabled. ## Example requests @@ -119,3 +104,19 @@ GET /_segments } ``` +## Response body fields + +Parameter | Data type | Description + :--- | :--- | :--- +`` | String | The name of the segment used to create internal file names in the shard directory. +`generation` | Integer | The generation number, such as `0`, incremented for each written segment and used to name the segment. +`num_docs` | Integer | The number of documents, obtained from Lucene. Nested documents are counted separately from their parents. Deleted documents, as well as recently indexed documents that are not yet assigned to a segment, are excluded. +`deleted_docs` | Integer | The number of deleted documents, obtained from Lucene, which may not match the actual number of delete operations performed. Recently deleted documents that are not yet assigned to a segment are excluded. Deleted documents are automatically merged when appropriate. OpenSearch will occasionally delete extra documents in order to track recent shard operations. +`size_in_bytes` | Integer | The amount of disk space used by the segment, for example, `50kb`. +`memory_in_bytes` | Integer | The amount of segment data, measured in bytes, that is kept in memory to facilitate efficient search operations, such as `1264`. A value of `-1` indicates that OpenSearch was unable to compute this number. +`committed` | Boolean | When `true`, the segments are synced to disk. Segments synced to disk can survive a hard reboot. If `false`, then uncommitted segment data is stored in the transaction log as well so that changes can be replayed at the next startup. +`search` | Boolean | When `true`, segment search is enabled. When `false`, the segment may have already been written to disk and require a refresh in order to be searchable. +`version` | String | The Lucene version used to write the segment. +`compound` | Boolean | When `true`, indicates that Lucene merged all segment files into one file in order to save any file descriptions. +`attributes` | Object | Shows if high compression was enabled. + diff --git a/_api-reference/index-apis/shrink-index.md b/_api-reference/index-apis/shrink-index.md index 17b7c4dff6..e3e1c67155 100644 --- a/_api-reference/index-apis/shrink-index.md +++ b/_api-reference/index-apis/shrink-index.md @@ -13,25 +13,10 @@ redirect_from: The shrink index API operation moves all of your data in an existing index into a new index with fewer primary shards. -## Example - -```json -POST /my-old-index/_shrink/my-new-index -{ - "settings": { - "index.number_of_replicas": 4, - "index.number_of_shards": 3 - }, - "aliases":{ - "new-index-alias": {} - } -} -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +```json POST //_shrink/ PUT //_shrink/ ``` @@ -44,27 +29,32 @@ When creating new indexes with this operation, remember that OpenSearch indexes `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<` -## URL parameters - -The shrink index API operation requires you to specify both the source index and the target index. All other parameters are optional. +## Path parameters Parameter | Type | description :--- | :--- | :--- <index-name> | String | The index to shrink. <target-index> | String | The target index to shrink the source index into. + +## Query parameters + +The shrink index API operation requires you to specify both the source index and the target index. All other parameters are optional. + +Parameter | Type | description +:--- | :--- | :--- wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for the request to return a response. Default is `30s`. wait_for_completion | Boolean | When set to `false`, the request returns immediately instead of after the operation is finished. To monitor the operation status, use the [Tasks API]({{site.url}}{{site.baseurl}}/api-reference/tasks/) with the task ID returned by the request. Default is `true`. task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. -## Request body +## Request body fields You can use the request body to configure some index settings for the target index. All fields are optional. Field | Type | Description :--- | :--- | :--- -alias | Object | Sets an alias for the target index. Can have the fields `filter`, `index_routing`, `is_hidden`, `is_write_index`, `routing`, or `search_routing`. See [Index Aliases]({{site.url}}{{site.baseurl}}/api-reference/alias/#request-body). +alias | Object | Sets an alias for the target index. Can have the fields `filter`, `index_routing`, `is_hidden`, `is_write_index`, `routing`, or `search_routing`. See [Index Aliases]({{site.url}}{{site.baseurl}}/api-reference/alias/#request-body-fields). settings | Object | Index settings you can apply to your target index. See [Index Settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/). [max_shard_size](#the-max_shard_size-parameter) | Bytes | Specifies the maximum size of a primary shard in the target index. Because `max_shard_size` conflicts with the `index.number_of_shards` setting, you cannot set both of them at the same time. @@ -84,4 +74,20 @@ The minimum number of primary shards for the target index is 1. ## Index codec considerations -For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#splits-and-shrinks). \ No newline at end of file +For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#splits-and-shrinks). + +## Example request + +```json +POST /my-old-index/_shrink/my-new-index +{ + "settings": { + "index.number_of_replicas": 4, + "index.number_of_shards": 3 + }, + "aliases":{ + "new-index-alias": {} + } +} +``` +{% include copy-curl.html %} \ No newline at end of file diff --git a/_api-reference/index-apis/split.md b/_api-reference/index-apis/split.md index ad13bffbba..b3db4c3340 100644 --- a/_api-reference/index-apis/split.md +++ b/_api-reference/index-apis/split.md @@ -33,7 +33,7 @@ PUT /sample-index1/_split/split-index1 ## Path and HTTP methods -``` +```json POST //_split/ PUT //_split/ ``` @@ -48,7 +48,14 @@ OpenSearch indexes have the following naming restrictions: `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<` -## URL parameters +## Path parameters + +Parameter | Type | Description +:--- | :--- | :--- +<source-index> | String | The source index to split. +<target-index> | String | The index to create. + +## Query parameters Your request must include the source and target indexes. All split index parameters are optional. diff --git a/_api-reference/index-apis/stats.md b/_api-reference/index-apis/stats.md index 7310298594..728fe7751f 100644 --- a/_api-reference/index-apis/stats.md +++ b/_api-reference/index-apis/stats.md @@ -825,6 +825,6 @@ By default, the returned statistics are aggregated in the `primaries` and `total ``` -## Response fields +## Response body fields For information about response fields, see [Nodes Stats API response fields]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#indices). diff --git a/_api-reference/index-apis/update-alias.md b/_api-reference/index-apis/update-alias.md index f32d34025e..c069703bf3 100644 --- a/_api-reference/index-apis/update-alias.md +++ b/_api-reference/index-apis/update-alias.md @@ -10,7 +10,7 @@ nav_order: 5 **Introduced 1.0** {: .label .label-purple } -The Create or Update Alias API adds a data stream or index to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/). +The Create or Update Alias API adds one or more indexes to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/). The Create or Update Alias API is distinct from the [Alias API]({{site.url}}{{site.baseurl}}/opensearch/rest-api/alias/), which supports the addition and removal of aliases and the removal of alias indexes. In contrast, the following API only supports adding or updating an alias without updating the index itself. Each API also uses different request body parameters. {: .note} @@ -35,7 +35,7 @@ PUT /_alias | Parameter | Type | Description | :--- | :--- | :--- -| `target` | String | A comma-delimited list of data streams and indexes. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. | +| `target` | String | A comma-delimited list of indexes. Wildcard expressions (`*`) are supported. To target all indexes in a cluster, use `_all` or `*`. Optional. | | `alias-name` | String | The alias name to be created or updated. Optional. | ## Query parameters @@ -53,7 +53,7 @@ In the request body, you can specify the index name, the alias name, and the set Field | Type | Description :--- | :--- | :--- | :--- -`index` | String | A comma-delimited list of data streams or indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path. +`index` | String | A comma-delimited list of indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path. `alias` | String | The name of the alias. If this field is set, it will override the alias name specified in the URL path. `is_write_index` | Boolean | Specifies whether the index should be a write index. An alias can only have one write index at a time. If a write request is submitted to an alias that links to multiple indexes, then OpenSearch runs the request only on the write index. `routing` | String | Assigns a custom value to a shard for specific operations. diff --git a/_api-reference/index-apis/update-settings.md b/_api-reference/index-apis/update-settings.md index 9fc9f01f85..18caf2185d 100644 --- a/_api-reference/index-apis/update-settings.md +++ b/_api-reference/index-apis/update-settings.md @@ -15,24 +15,18 @@ You can use the update settings API operation to update index-level settings. Yo Aside from the static and dynamic index settings, you can also update individual plugins' settings. To get the full list of updatable settings, run `GET /_settings?include_defaults=true`. -## Example + +## Path and HTTP methods ```json -PUT /sample-index1/_settings -{ - "index.plugins.index_state_management.rollover_skip": true, - "index": { - "number_of_replicas": 4 - } -} +PUT //_settings ``` -{% include copy-curl.html %} -## Path and HTTP methods +## Path parameters -``` -PUT //_settings -``` +Parameter | Type | Description +:--- | :--- | :--- +<index> | String | The index to update. Can be a comma-separated list of multiple index names. Use `_all` or `*` to specify all indexes. ## Query parameters @@ -59,7 +53,20 @@ The request body must all of the index settings that you want to update. } ``` -## Response +## Example request + +```json +PUT /sample-index1/_settings +{ + "index.plugins.index_state_management.rollover_skip": true, + "index": { + "number_of_replicas": 4 + } +} +``` +{% include copy-curl.html %} + +## Example response ```json { diff --git a/_api-reference/list/index.md b/_api-reference/list/index.md new file mode 100644 index 0000000000..b3f694157c --- /dev/null +++ b/_api-reference/list/index.md @@ -0,0 +1,175 @@ +--- +layout: default +title: List API +nav_order: 45 +has_children: true +--- + +# List APIs +**Introduced 2.18** +{: .label .label-purple } + +The List API retrieves statistics about indexes and shards in a paginated format. This streamlines the task of processing responses that include many indexes. + +The List API supports two operations: + +- [List indices]({{site.url}}{{site.baseurl}}/api-reference/list/list-indices/) +- [List shards]({{site.url}}{{site.baseurl}}/api-reference/list/list-shards/) + +## Shared query parameters + +All List API operations support the following optional query parameters. + +Parameter | Description +:--- | :--- | +`v` | Provides verbose output by adding headers to the columns. It also adds some formatting to help align each of the columns. All examples in this section include the `v` parameter. +`help` | Lists the default and other available headers for a given operation. +`h` | Limits the output to specific headers. +`format` | The format in which to return the result. Valid values are `json`, `yaml`, `cbor`, and `smile`. +`s` | Sorts the output by the specified columns. + +## Examples + +The following examples show how to use the optional query parameters to customize all List API responses. + + +### Get verbose output + +To query indexes and their statistics with a verbose output that includes all column headings in the response, use the `v` query parameter, as shown in the following example. + +#### Request + +```json +GET _list/indices?v +``` +{% include copy-curl.html %} + +#### Response + +```json +health status index uuid pri rep docs.count docs.deleted +green open .kibana_1 - - - - +yellow open sample-index-1 - - - - +next_token null +``` + + +### Get all available headers + +To see all the available headers, use the `help` parameter with the following syntax: + +```json +GET _list/?help +``` +{% include copy-curl.html %} + +#### Request + +The following example list indices operation returns all the available headers: + +```json +GET _list/indices?help +``` +{% include copy-curl.html %} + +#### Response + +The following example displays the indexes and their health status in a table: + +```json +health | h | current health status +status | s | open/close status +index | i,idx | index name +uuid | id,uuid | index uuid +pri | p,shards.primary,shardsPrimary | number of primary shards +rep | r,shards.replica,shardsReplica | number of replica shards +docs.count | dc,docsCount | available docs +``` + +### Get a subset of headers + +To limit the output to a subset of headers, use the `h` parameter with the following syntax: + +```json +GET _list/?h=,&v +``` +{% include copy-curl.html %} + +For any operation, you can determine which headers are available by using the `help` parameter and then using the `h` parameter to limit the output to only a subset of headers. + +#### Request + +The following example limits the indexes in the response to only the index name and health status headers: + +```json +GET _list/indices?h=health,index +``` +{% include copy-curl.html %} + +### Response + +```json +green .kibana_1 +yellow sample-index-1 +next_token null +``` + + +### Sort by a header + +To sort the output on a single page by a header, use the `s` parameter with the following syntax: + +```json +GET _list/?s=, +``` +{% include copy-curl.html %} + +#### Request + +The following example request sorts indexes by index name: + +```json +GET _list/indices?s=h,i +``` +{% include copy-curl.html %} + +#### Response + +```json +green sample-index-2 +yellow sample-index-1 +next_token null +``` + +### Retrieve data in JSON format + +By default, List APIs return data in a `text/plain` format. Other supported formats are [YAML](https://yaml.org/), [CBOR](https://cbor.io/), and [Smile](https://github.com/FasterXML/smile-format-specification). + + +To retrieve data in the JSON format, use the `format=json` parameter with the following syntax. + +If you use the Security plugin, ensure you have the appropriate permissions. +{: .note } + +#### Request + +```json +GET _list/?format=json +``` +{% include copy-curl.html %} + +#### Request + +```json +GET _list/indices?format=json +``` +{% include copy-curl.html %} + +### Response + +The response contains data in JSON format: + +```json +{"next_token":null,"indices":[{"health":"green","status":"-","index":".kibana_1","uuid":"-","pri":"-","rep":"-","docs.count":"-","docs.deleted":"-","store.size":"-","pri.store.size":"-"},{"health":"yellow","status":"-","index":"sample-index-1","uuid":"-","pri":"-","rep":"-","docs.count":"-","docs.deleted":"-","store.size":"-","pri.store.size":"-"}]} +``` + diff --git a/_api-reference/list/list-indices.md b/_api-reference/list/list-indices.md new file mode 100644 index 0000000000..618413d35c --- /dev/null +++ b/_api-reference/list/list-indices.md @@ -0,0 +1,83 @@ +--- +layout: default +title: List indices +parent: List API +nav_order: 25 +has_children: false +--- + +# List indices +**Introduced 2.18** +{: .label .label-purple } + +The list indices operation provides the following index information in a paginated format: + +- The amount of disk space used by the index. +- The number of shards contained in the index. +- The index's health status. + +## Path and HTTP methods + +```json +GET _list/indices +GET _list/indices/ +``` + +## Query parameters + +Parameter | Type | Description +:--- | :--- | :--- +`bytes` | Byte size | Specifies the units for the byte size, for example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +`health` | String | Limits indexes based on their health status. Supported values are `green`, `yellow`, and `red`. +`include_unloaded_segments` | Boolean | Whether to include information from segments not loaded into memory. Default is `false`. +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`pri` | Boolean | Whether to return information only from the primary shards. Default is `false`. +`time` | Time | Specifies the time units, for example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +`expand_wildcards` | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. +`next_token` | String | Fetches the next page of indexes. When `null`, only provides the first page of indexes. Default is `null`. +`size` | Integer | The maximum number of indexes to be displayed on a single page. The number of indexes on a single page of the response is not always equal to the specified `size`. Default is `500`. Minimum is `1` and maximum value is `5000`. +`sort` | String | The order in which the indexes are displayed. If `desc`, then the most recently created indexes are displayed first. If `asc`, then the oldest indexes are displayed first. Default is `asc`. + +When using the `next_token` path parameter, use the token produced by the response to see the next page of indexes. After the API returns `null`, all indexes contained in the API have been returned. +{: .tip } + + +## Example requests + +To get information for all the indexes, use the following query and keep specifying the `next_token` as received from response until its `null`: + +```json +GET _list/indices/?v&next_token=token +``` + + +To limit the information to a specific index, add the index name after your query, as shown in the following example: + +```json +GET _list/indices/?v +``` +{% include copy-curl.html %} + +To get information about more than one index, separate the indexes with commas, as shown in the following example: + +```json +GET _list/indices/index1,index2,index3?v&next_token=token +``` +{% include copy-curl.html %} + + +## Example response + +**Plain text format** + +```json +health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size +green | open | movies | UZbpfERBQ1-3GSH2bnM3sg | 1 | 1 | 1 | 0 | 7.7kb | 3.8kb +next_token MTcyOTE5NTQ5NjM5N3wub3BlbnNlYXJjaC1zYXAtbG9nLXR5cGVzLWNvbmZpZw== +``` + +**JSON format** + +```json +{"next_token":"MTcyOTE5NTQ5NjM5N3wub3BlbnNlYXJjaC1zYXAtbG9nLXR5cGVzLWNvbmZpZw==","indices":[{"health":"green","status":"open","index":"movies","uuid":"UZbpfERBQ1-3GSH2bnM3sg","pri":"1","rep":"1","docs.count":"1","docs.deleted":"0","store.size":"7.7kb","pri.store.size":"3.8kb"}]} +``` diff --git a/_api-reference/list/list-shards.md b/_api-reference/list/list-shards.md new file mode 100644 index 0000000000..7111aeb0f2 --- /dev/null +++ b/_api-reference/list/list-shards.md @@ -0,0 +1,78 @@ +--- +layout: default +title: List shards +parent: List API +nav_order: 20 +--- + +# List shards +**Introduced 2.18** +{: .label .label-purple } + +The list shards operation outputs, in a paginated format, the state of all primary and replica shards and how they are distributed. + +## Path and HTTP methods + +```json +GET _list/shards +GET _list/shards/ +``` + +## Query parameters + +All parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +`bytes` | Byte size | Specifies the byte size units, for example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +`local` | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`cancel_after_time_interval` | Time | The amount of time after which the shard request is canceled. Default is `-1` (no timeout). +`time` | Time | Specifies the time units, for example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +`next_token` | String | Fetches the next page of indexes. When `null`, only provides the first page of indexes. Default is `null`. +`size` | Integer | The maximum number of indexes to be displayed on a single page. The number of indexes on a single page of the response is not always equal to the specified `size`. Default and minimum value is `2000`. Maximum value is `20000`. +`sort` | String | The order in which the indexes are displayed. If `desc`, then the most recently created indexes are displayed first. If `asc`, then the oldest indexes are displayed first. Default is `asc`. + +When using the `next_token` path parameter, use the token produced by the response to see the next page of indexes. After the API returns `null`, all indexes contained in the API have been returned. +{: .tip } + +## Example requests + +To get information for all the indexes and shards, use the following query and keep specifying the `next_token` as received from response until its `null`: + +```json +GET _list/shards/?v&next_token=token +``` + +To limit the information to a specific index, add the index name after your query, as shown in the following example and keep specifying the `next_token` as received from response until its `null`: + +```json +GET _list/shards/?v&next_token=token +``` +{% include copy-curl.html %} + +If you want to get information for more than one index, separate the indexes with commas, as shown in the following example: + +```json +GET _list/shards/index1,index2,index3?v&next_token=token +``` +{% include copy-curl.html %} + +## Example response + +**Plain text format** + +```json +index | shard | prirep | state | docs | store | ip | | node +plugins | 0 | p | STARTED | 0 | 208b | 172.18.0.4 | odfe-node1 +plugins | 0 | r | STARTED | 0 | 208b | 172.18.0.3 | odfe-node2 +.... +.... +next_token MTcyOTE5NTQ5NjM5N3wub3BlbnNlYXJjaC1zYXAtbG9nLXR5cGVzLWNvbmZpZw== +``` + +**JSON format** + +```json +{"next_token":"MTcyOTE5NTQ5NjM5N3wub3BlbnNlYXJjaC1zYXAtbG9nLXR5cGVzLWNvbmZpZw==","shards":[{"index":"plugins","shard":"0","prirep":"p","state":"STARTED","docs":"0","store":"208B","ip":"172.18.0.4","node":"odfe-node1"},{"index":"plugins","shard":"0","prirep":"r","state":"STARTED","docs":"0","store":"208B","ip":"172.18.0.3","node":"odfe-node2"}]} +``` diff --git a/_api-reference/msearch-template.md b/_api-reference/msearch-template.md index 316cc134ff..fdebf5bed1 100644 --- a/_api-reference/msearch-template.md +++ b/_api-reference/msearch-template.md @@ -15,33 +15,17 @@ The Multi-search Template API runs multiple search template requests in a single The Multi-search Template API uses the following paths: -``` +```json GET /_msearch/template POST /_msearch/template GET /{index}/_msearch/template POST /{index}/_msearch/template ``` -## Request body -The multi-search template request body follows this pattern, similar to the [Multi-search API]({{site.url}}{{site.baseurl}}/api-reference/multi-search/) pattern: +## Query parameters and metadata options -``` -Metadata\n -Query\n -Metadata\n -Query\n - -``` - -- Metadata lines include options, such as which indexes to search and the type of search. -- Query lines use [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). - -Like the [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) operation, the JSON doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse multi-search requests and requires that the request body end with a newline character. - -## URL parameters and metadata options - -All multi-search template URL parameters are optional. Some can also be applied per search as part of each metadata line. +All parameters are optional. Some can also be applied per search as part of each metadata line. Parameter | Type | Description | Supported in metadata :--- | :--- | :--- | :--- @@ -57,9 +41,9 @@ rest_total_hits_as_int | String | Specifies whether the `hits.total` property is search_type | String | Affects the relevance score. Valid options are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using term and document frequencies for a single shard (faster, less accurate), whereas `dfs_query_then_fetch` uses term and document frequencies across all shards (slower, more accurate). Default is `query_then_fetch`. | Yes typed_keys | Boolean | Specifies whether to prefix aggregation names with their internal types in the response. Default is `false`. | No -## Metadata-only options +### Metadata-only options -Some options can't be applied as URL parameters to the entire request. Instead, you can apply them per search as part of each metadata line. All are optional. +Some options can't be applied as parameters to the entire request. Instead, you can apply them per search as part of each metadata line. All are optional. Option | Type | Description :--- | :--- | :--- @@ -68,11 +52,26 @@ preference | String | Specifies the nodes or shards on which you want to perform request_cache | Boolean | Specifies whether to cache results, which can improve latency for repeated searches. Default is to use the `index.requests.cache.enable` setting for the index (which defaults to `true` for new indexes). routing | String | Comma-separated custom routing values, for example, `"routing": "value1,value2,value3"`. -## Example +## Request body + +The multi-search template request body follows this pattern, similar to the [Multi-search API]({{site.url}}{{site.baseurl}}/api-reference/multi-search/) pattern: -The following example `msearch/template` API request runs queries against a single index using multiple templates named `line_search_template` and `play_search_template`: +``` +Metadata\n +Query\n +Metadata\n +Query\n -### Request +``` + +- Metadata lines include options, such as which indexes to search and the type of search. +- Query lines use [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). + +Like the [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) operation, the JSON doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse multi-search requests and requires that the request body end with a newline character. + +## Example request + +The following example `msearch/template` API request runs queries against a single index using multiple templates named `line_search_template` and `play_search_template`: ```json GET _msearch/template @@ -83,7 +82,7 @@ GET _msearch/template ``` {% include copy-curl.html %} -### Response +## Example response OpenSearch returns an array with the results of each search in the same order as in the multi-search template request: @@ -107,7 +106,8 @@ OpenSearch returns an array with the results of each search in the same order as }, "max_score": null, "hits": [] - } + }, + "status": 200 }, { "took": 3, @@ -125,7 +125,8 @@ OpenSearch returns an array with the results of each search in the same order as }, "max_score": null, "hits": [] - } + }, + "status": 200 } ] } diff --git a/_api-reference/multi-search.md b/_api-reference/multi-search.md index ff04b2d075..d8ac41ecec 100644 --- a/_api-reference/multi-search.md +++ b/_api-reference/multi-search.md @@ -13,19 +13,6 @@ redirect_from: As the name suggests, the multi-search operation lets you bundle multiple search requests into a single request. OpenSearch then executes the searches in parallel, so you get back the response more quickly compared to sending one request per search. OpenSearch executes each search independently, so the failure of one doesn't affect the others. -## Example - -```json -GET _msearch -{ "index": "opensearch_dashboards_sample_data_logs"} -{ "query": { "match_all": {} }, "from": 0, "size": 10} -{ "index": "opensearch_dashboards_sample_data_ecommerce", "search_type": "dfs_query_then_fetch"} -{ "query": { "match_all": {} } } - -``` -{% include copy-curl.html %} - - ## Path and HTTP methods The Multi-search API uses the following paths: @@ -37,27 +24,10 @@ POST _msearch POST /_msearch ``` -## Request body - -The multi-search request body follows this pattern: - -``` -Metadata\n -Query\n -Metadata\n -Query\n - -``` - -- Metadata lines include options, such as which indexes to search and the type of search. -- Query lines use the [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). - -Just like the [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) operation, the JSON doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse multi-search requests and requires that the request body end with a newline character. - -## URL parameters and metadata options +## Query parameters and metadata options -All multi-search URL parameters are optional. Some can also be applied per-search as part of each metadata line. +All parameters are optional. Some can also be applied per-search as part of each metadata line. Parameter | Type | Description | Supported in metadata line :--- | :--- | :--- @@ -80,7 +50,7 @@ From the REST API specification: A threshold that enforces a pre-filter round tr ## Metadata-only options -Some options can't be applied as URL parameters to the entire request. Instead, you can apply them per-search as part of each metadata line. All are optional. +Some options can't be applied as parameters to the entire request. Instead, you can apply them per-search as part of each metadata line. All are optional. Option | Type | Description :--- | :--- | :--- @@ -89,12 +59,28 @@ preference | String | The nodes or shards that you'd like to perform the search. request_cache | Boolean | Whether to cache results, which can improve latency for repeat searches. Default is to use the `index.requests.cache.enable` setting for the index (which defaults to `true` for new indexes). routing | String | Comma-separated custom routing values, for example, `"routing": "value1,value2,value3"`. +## Request body + +The multi-search request body follows this pattern: + +``` +Metadata\n +Query\n +Metadata\n +Query\n + +``` + +- Metadata lines include options, such as which indexes to search and the type of search. +- Query lines use the [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). + +Just like the [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) operation, the JSON doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse multi-search requests and requires that the request body end with a newline character. + -## Example +## Example request The following example `msearch` API request runs queries against multiple indexes: -### Request ```json GET _msearch @@ -107,7 +93,7 @@ GET _msearch {% include copy-curl.html %} -### Response +## Example response OpenSearch returns an array with the results of each search in the same order as the multi-search request. diff --git a/_api-reference/nodes-apis/nodes-hot-threads.md b/_api-reference/nodes-apis/nodes-hot-threads.md index f5e014dd6d..5339903d1e 100644 --- a/_api-reference/nodes-apis/nodes-hot-threads.md +++ b/_api-reference/nodes-apis/nodes-hot-threads.md @@ -11,12 +11,6 @@ nav_order: 30 The nodes hot threads endpoint provides information about busy JVM threads for selected cluster nodes. It provides a unique view of the of activity each node. -#### Example - -```json -GET /_nodes/hot_threads -``` -{% include copy-curl.html %} ## Path and HTTP methods diff --git a/_api-reference/nodes-apis/nodes-info.md b/_api-reference/nodes-apis/nodes-info.md index a8953505ff..a8767cafce 100644 --- a/_api-reference/nodes-apis/nodes-info.md +++ b/_api-reference/nodes-apis/nodes-info.md @@ -18,25 +18,10 @@ The nodes info API represents mostly static information about your cluster's nod - Thread pools settings - Installed plugins -## Example - -To get information about all nodes in a cluster, use the following query: - -```json -GET /_nodes -``` -{% include copy-curl.html %} - -To get thread pool information about the cluster manager node only, use the following query: - -```json -GET /_nodes/master:true/thread_pool -``` -{% include copy-curl.html %} ## Path and HTTP methods -```bash +```json GET /_nodes GET /_nodes/ GET /_nodes/ @@ -88,6 +73,13 @@ GET /_nodes/cluster_manager:true/process,transport ``` {% include copy-curl.html %} +To get thread pool information about the cluster manager node only, use the following query: + +```json +GET /_nodes/master:true/thread_pool +``` +{% include copy-curl.html %} + ## Example response The response contains the metric groups specified in the `` request parameter (in this case, `process` and `transport`): @@ -136,7 +128,7 @@ The response contains the metric groups specified in the `` request par } ``` -## Response fields +## Response body fields The response contains the basic node identification and build info for every node matching the `` request parameter. The following table lists the response fields. diff --git a/_api-reference/nodes-apis/nodes-reload-secure.md b/_api-reference/nodes-apis/nodes-reload-secure.md index b4be66ddd4..1ed0eabde4 100644 --- a/_api-reference/nodes-apis/nodes-reload-secure.md +++ b/_api-reference/nodes-apis/nodes-reload-secure.md @@ -13,7 +13,7 @@ The nodes reload secure settings endpoint allows you to change secure settings o ## Path and HTTP methods -``` +```json POST _nodes/reload_secure_settings POST _nodes//reload_secure_settings ``` @@ -26,7 +26,7 @@ Parameter | Type | Description :--- | :--- | :--- nodeId | String | A comma-separated list of nodeIds used to filter results. Supports [node filters]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/index/#node-filters). Defaults to `_all`. -## Request fields +## Request body fields The request may include an optional object containing the password for the OpenSearch keystore. diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index 2ccea54390..604b89969b 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -788,7 +788,7 @@ Select the arrow to view the example response. ``` -## Response fields +## Response body fields The following table lists all response fields. @@ -893,24 +893,27 @@ search.suggest_total | Integer | The total number of shard suggest operations. search.suggest_time_in_millis | Integer | The total amount of time for all shard suggest operations, in milliseconds. search.suggest_current | Integer | The number of shard suggest operations that are currently running. search.request | Object | Statistics about coordinator search operations for the node. +search.request.took.time_in_millis | Integer | The total amount of time taken for all search requests, in milliseconds. +search.request.took.current | Integer | The number of search requests that are currently running. +search.request.took.total | Integer | The total number of search requests completed. search.request.dfs_pre_query.time_in_millis | Integer | The total amount of time for all coordinator depth-first search (DFS) prequery operations, in milliseconds. search.request.dfs_pre_query.current | Integer | The number of coordinator DFS prequery operations that are currently running. -search.request.dfs_pre_query.total | Integer | The total number of coordinator DFS prequery operations. +search.request.dfs_pre_query.total | Integer | The total number of coordinator DFS prequery operations completed. search.request.query.time_in_millis | Integer | The total amount of time for all coordinator query operations, in milliseconds. search.request.query.current | Integer | The number of coordinator query operations that are currently running. -search.request.query.total | Integer | The total number of coordinator query operations. +search.request.query.total | Integer | The total number of coordinator query operations completed. search.request.fetch.time_in_millis | Integer | The total amount of time for all coordinator fetch operations, in milliseconds. search.request.fetch.current | Integer | The number of coordinator fetch operations that are currently running. -search.request.fetch.total | Integer | The total number of coordinator fetch operations. +search.request.fetch.total | Integer | The total number of coordinator fetch operations completed. search.request.dfs_query.time_in_millis | Integer | The total amount of time for all coordinator DFS prequery operations, in milliseconds. search.request.dfs_query.current | Integer | The number of coordinator DFS prequery operations that are currently running. -search.request.dfs_query.total | Integer | The total number of coordinator DFS prequery operations. +search.request.dfs_query.total | Integer | The total number of coordinator DFS prequery operations completed. search.request.expand.time_in_millis | Integer | The total amount of time for all coordinator expand operations, in milliseconds. search.request.expand.current | Integer | The number of coordinator expand operations that are currently running. -search.request.expand.total | Integer | The total number of coordinator expand operations. +search.request.expand.total | Integer | The total number of coordinator expand operations completed. search.request.can_match.time_in_millis | Integer | The total amount of time for all coordinator match operations, in milliseconds. search.request.can_match.current | Integer | The number of coordinator match operations that are currently running. -search.request.can_match.total | Integer | The total number of coordinator match operations. +search.request.can_match.total | Integer | The total number of coordinator match operations completed. merges | Object | Statistics about merge operations for the node. merges.current | Integer | The number of merge operations that are currently running. merges.current_docs | Integer | The number of document merges that are currently running. diff --git a/_api-reference/nodes-apis/nodes-usage.md b/_api-reference/nodes-apis/nodes-usage.md index 355b7f8ff2..1101b2989a 100644 --- a/_api-reference/nodes-apis/nodes-usage.md +++ b/_api-reference/nodes-apis/nodes-usage.md @@ -13,7 +13,7 @@ The nodes usage endpoint returns low-level information about REST action usage o ## Path and HTTP methods -``` +```json GET _nodes/usage GET _nodes//usage GET _nodes/usage/ diff --git a/_api-reference/profile.md b/_api-reference/profile.md index 4f8c69db9c..e54ffabe15 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -18,6 +18,18 @@ The Profile API provides timing information about the execution of individual co The Profile API is a resource-consuming operation that adds overhead to search operations. {: .warning} +## Path and HTTP methods + +```json +GET /testindex/_search +{ + "profile": true, + "query" : { + "match" : { "title" : "wind" } + } +} +``` + ## Concurrent segment search Starting in OpenSearch 2.12, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. The Profile API response contains several additional fields with statistics about _slices_. @@ -26,7 +38,9 @@ A slice is the unit of work that can be executed by a thread. Each query can be In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices. -## Example request: Non-concurrent search +## Example requests + +### Non-concurrent search To use the Profile API, include the `profile` parameter set to `true` in the search request sent to the `_search` endpoint: @@ -70,7 +84,54 @@ The response contains an additional `time` field with human-readable units, for The Profile API response is verbose, so if you're running the request through the `curl` command, include the `?pretty` query parameter to make the response easier to understand. {: .tip} -#### Example response +### Aggregations + +To profile aggregations, send an aggregation request and provide the `profile` parameter set to `true`. + +#### Global aggregation + +```json +GET /opensearch_dashboards_sample_data_ecommerce/_search +{ + "profile": "true", + "size": 0, + "query": { + "match": { "manufacturer": "Elitelligence" } + }, + "aggs": { + "all_products": { + "global": {}, + "aggs": { + "avg_price": { "avg": { "field": "taxful_total_price" } } + } + }, + "elitelligence_products": { "avg": { "field": "taxful_total_price" } } + } +} +``` +{% include copy-curl.html %} + +#### Non-global aggregation + +```json +GET /opensearch_dashboards_sample_data_ecommerce/_search +{ + "size": 0, + "aggs": { + "avg_taxful_total_price": { + "avg": { + "field": "taxful_total_price" + } + } + } +} +``` +{% include copy-curl.html %} + + +## Example response + +### Non-concurrent search The response contains profiling information: @@ -221,7 +282,7 @@ The response contains profiling information: ``` -#### Example response: Concurrent segment search +### Concurrent segment search The following is an example response for a concurrent segment search with three segment slices: @@ -435,7 +496,7 @@ The following is an example response for a concurrent segment search with three ``` -## Response fields +## Response body fields The response includes the following fields. @@ -522,34 +583,9 @@ Reason | Description `aggregation` | A collector for aggregations that is run against the specified query scope. OpenSearch uses a single `aggregation` collector to collect documents for all aggregations. `global_aggregation` | A collector that is run against the global query scope. Global scope is different from a specified query scope, so in order to collect the entire dataset, a `match_all` query must be run. -## Aggregations - -To profile aggregations, send an aggregation request and provide the `profile` parameter set to `true`. - -#### Example request: Global aggregation - -```json -GET /opensearch_dashboards_sample_data_ecommerce/_search -{ - "profile": "true", - "size": 0, - "query": { - "match": { "manufacturer": "Elitelligence" } - }, - "aggs": { - "all_products": { - "global": {}, - "aggs": { - "avg_price": { "avg": { "field": "taxful_total_price" } } - } - }, - "elitelligence_products": { "avg": { "field": "taxful_total_price" } } - } -} -``` -{% include copy-curl.html %} +### Aggregation responses -#### Example response: Global aggregation +#### Response: Global aggregation The response contains profiling information: @@ -804,24 +840,7 @@ The response contains profiling information: ``` -#### Example request: Non-global aggregation - -```json -GET /opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "avg_taxful_total_price": { - "avg": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response: Non-global aggregation +#### Response: Non-global aggregation The response contains profiling information: @@ -966,13 +985,13 @@ The response contains profiling information: ``` -### Response fields +#### Response body fields The `aggregations` array contains aggregation objects with the following fields. Field | Data type | Description :--- | :--- | :--- -`type` | String | The aggregator type. In the [non-global aggregation example response](#example-response-non-global-aggregation), the aggregator type is `AvgAggregator`. [Global aggregation example response](#example-request-global-aggregation) contains a `GlobalAggregator` with an `AvgAggregator` child. +`type` | String | The aggregator type. In the [non-global aggregation example response](#response-non-global-aggregation), the aggregator type is `AvgAggregator`. [Global aggregation example response](#response-global-aggregation) contains a `GlobalAggregator` with an `AvgAggregator` child. `description` | String | Contains a Lucene explanation of the aggregation. Helps differentiate aggregations with the same type. `time_in_nanos` | Long | The total elapsed time for this aggregation, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). [`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution. @@ -982,7 +1001,7 @@ Field | Data type | Description `min_slice_time_in_nanos` |Long |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. `avg_slice_time_in_nanos` |Long |The average amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. -### The `breakdown` object +#### The `breakdown` object The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Each field in the `breakdown` object represents an internal Lucene method executed within the aggregation. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object is comprised of the following fields. All fields contain integer values. diff --git a/_api-reference/rank-eval.md b/_api-reference/rank-eval.md index 881ff3a22b..61c80be592 100644 --- a/_api-reference/rank-eval.md +++ b/_api-reference/rank-eval.md @@ -12,7 +12,7 @@ The [rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) ## Path and HTTP methods -``` +```json GET /_rank_eval POST /_rank_eval ``` @@ -28,7 +28,7 @@ allow_no_indices | Boolean | Defaults to `true`. When set to `false` the respons expand_wildcards | String | Expand wildcard expressions for indexes that are `open`, `closed`, `hidden`, `none`, or `all`. search_type | String | Set search type to either `query_then_fetch` or `dfs_query_then_fetch`. -## Request fields +## Request body fields The request body must contain at least one parameter. diff --git a/_api-reference/remote-info.md b/_api-reference/remote-info.md index ac2971f294..25e032a9d5 100644 --- a/_api-reference/remote-info.md +++ b/_api-reference/remote-info.md @@ -17,12 +17,11 @@ The response is more comprehensive and useful than a call to `_cluster/settings` ## Path and HTTP methods -``` +```json GET _remote/info ``` -{% include copy-curl.html %} -## Response +## Example Response ```json { diff --git a/_api-reference/render-template.md b/_api-reference/render-template.md index 409fde5e4a..db2caa9cef 100644 --- a/_api-reference/render-template.md +++ b/_api-reference/render-template.md @@ -10,7 +10,7 @@ The Render Template API renders a [search template]({{site.url}}{{site.baseurl}} ## Paths and HTTP methods -``` +```json GET /_render/template POST /_render/template GET /_render/template/ @@ -25,7 +25,7 @@ The Render Template API supports the following optional path parameter. | :--- | :--- | :--- | | `id` | String | The ID of the search template to render. | -## Request options +## Request body fields The following options are supported in the request body of the Render Template API. diff --git a/_api-reference/script-apis/create-stored-script.md b/_api-reference/script-apis/create-stored-script.md index 0a915cd836..d28c504a51 100644 --- a/_api-reference/script-apis/create-stored-script.md +++ b/_api-reference/script-apis/create-stored-script.md @@ -34,7 +34,7 @@ All parameters are optional. | cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager. Defaults to 30 seconds. | | timeout | Time | The period of time to wait for a response. If a response is not received before the timeout value, the request fails and returns an error. Defaults to 30 seconds.| -## Request fields +## Request body fields | Field | Data type | Description | :--- | :--- | :--- @@ -49,7 +49,7 @@ All parameters are optional. ## Example request -The sample uses an index called `books` with the following documents: +The following example requests uses an index called `books` with the following documents: ````json {"index":{"_id":1}} @@ -60,6 +60,8 @@ The sample uses an index called `books` with the following documents: {"name":"book3","author":"Gilroy","ratings":[2,1,5]} ```` +### Creating a painless script + The following request creates the Painless script `my-first-script`. It sums the ratings for each book and displays the sum in the output. ````json @@ -96,10 +98,16 @@ curl -XPUT "http://opensearch:9200/_scripts/my-first-script" -H 'Content-Type: a {% include copy.html %} -The following request creates the Painless script `my-first-script`, which sums the ratings for each book and displays the sum in the output: +See [Execute Painless stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/exec-stored-script/) for information about running the script. + +### Creating or updating a stored script with parameters + +The Painless script supports `params` to pass variables to the script. + +The following request creates the Painless script `multiplier-script`. The request sums the ratings for each book, multiplies the summed value by the `multiplier` parameter, and displays the result in the output: ````json -PUT _scripts/my-first-script +PUT _scripts/multiplier-script { "script": { "lang": "painless", @@ -108,15 +116,13 @@ PUT _scripts/my-first-script for (int i = 0; i < doc['ratings'].length; ++i) { total += doc['ratings'][i]; } - return total; + return total * params['multiplier']; """ } } ```` {% include copy-curl.html %} -See [Execute Painless stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/exec-stored-script/) for information about running the script. - ## Example response The `PUT _scripts/my-first-script` request returns the following field: @@ -130,43 +136,5 @@ The `PUT _scripts/my-first-script` request returns the following field: To determine whether the script was successfully created, use the [Get stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/get-stored-script/) API, passing the script name as the `script` path parameter. {: .note} -### Response fields - -| Field | Data type | Description | -:--- | :--- | :--- -| acknowledged | Boolean | Whether the request was received. | - -## Creating or updating a stored script with parameters - -The Painless script supports `params` to pass variables to the script. - -#### Example - -The following request creates the Painless script `multiplier-script`. The request sums the ratings for each book, multiplies the summed value by the `multiplier` parameter, and displays the result in the output: - -````json -PUT _scripts/multiplier-script -{ - "script": { - "lang": "painless", - "source": """ - int total = 0; - for (int i = 0; i < doc['ratings'].length; ++i) { - total += doc['ratings'][i]; - } - return total * params['multiplier']; - """ - } -} -```` -{% include copy-curl.html %} - -### Example response -The `PUT _scripts/multiplier-script` request returns the following field: -````json -{ - "acknowledged" : true -} -```` \ No newline at end of file diff --git a/_api-reference/script-apis/delete-script.md b/_api-reference/script-apis/delete-script.md index fe9c272acc..22c2a3f394 100644 --- a/_api-reference/script-apis/delete-script.md +++ b/_api-reference/script-apis/delete-script.md @@ -9,7 +9,13 @@ nav_order: 4 **Introduced 1.0** {: .label .label-purple } -Deletes a stored script +Deletes a stored script. + +## Path and HTTP methods + +```json +DELETE _scripts/my-script +``` ## Path parameters @@ -47,7 +53,7 @@ The `DELETE _scripts/my-first-script` request returns the following field: To determine whether the stored script was successfully deleted, use the [Get stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/get-stored-script/) API, passing the script name as the `script` path parameter. -## Response fields +## Response body fields The request returns the following response fields: diff --git a/_api-reference/script-apis/exec-script.md b/_api-reference/script-apis/exec-script.md index b6476be980..cd31ad92f4 100644 --- a/_api-reference/script-apis/exec-script.md +++ b/_api-reference/script-apis/exec-script.md @@ -18,7 +18,7 @@ GET /_scripts/painless/_execute POST /_scripts/painless/_execute ``` -## Request fields +## Request body fields | Field | Description | :--- | :--- @@ -54,7 +54,7 @@ The response contains the average of two script parameters: } ``` -## Response fields +## Response body fields | Field | Description | :--- | :--- diff --git a/_api-reference/script-apis/exec-stored-script.md b/_api-reference/script-apis/exec-stored-script.md index a7de3b5274..31102c23dd 100644 --- a/_api-reference/script-apis/exec-stored-script.md +++ b/_api-reference/script-apis/exec-stored-script.md @@ -13,7 +13,22 @@ Runs a stored script written in the Painless language. OpenSearch provides several ways to run a script; the following sections show how to run a script by passing script information in the request body of a `GET /_search` request. -## Request fields +## Path and HTTP methods + +```json +GET books/_search +{ + "script_fields": { + "total_ratings": { + "script": { + "id": "my-first-script" + } + } + } +} +``` + +## Request field options | Field | Data type | Description | :--- | :--- | :--- @@ -104,7 +119,7 @@ The `GET books/_search` request returns the following fields: } ```` -## Response fields +## Response body fields | Field | Data type | Description | :--- | :--- | :--- diff --git a/_api-reference/script-apis/get-script-contexts.md b/_api-reference/script-apis/get-script-contexts.md index 85421128a1..fa45ce95ad 100644 --- a/_api-reference/script-apis/get-script-contexts.md +++ b/_api-reference/script-apis/get-script-contexts.md @@ -13,9 +13,9 @@ Retrieves all contexts for stored scripts. ## Example request -````json +```json GET _script_context -```` +``` {% include copy-curl.html %} ## Example response @@ -547,7 +547,7 @@ The `GET _script_context` request returns the following fields: } ```` -## Response fields +## Response body fields The `GET _script_context` request returns the following response fields: diff --git a/_api-reference/script-apis/get-script-language.md b/_api-reference/script-apis/get-script-language.md index 76414d52ea..a7bb7e7b51 100644 --- a/_api-reference/script-apis/get-script-language.md +++ b/_api-reference/script-apis/get-script-language.md @@ -89,7 +89,7 @@ The `GET _script_language` request returns the available contexts for each langu } ``` -## Response fields +## Response body fields The request contains the following response fields. diff --git a/_api-reference/script-apis/get-stored-script.md b/_api-reference/script-apis/get-stored-script.md index d7987974d3..341bfc046e 100644 --- a/_api-reference/script-apis/get-stored-script.md +++ b/_api-reference/script-apis/get-stored-script.md @@ -11,6 +11,12 @@ nav_order: 3 Retrieves a stored script. +## Path and HTTP methods + +```json +GET _scripts/my-first-script +``` + ## Path parameters | Parameter | Data type | Description | @@ -53,7 +59,7 @@ The `GET _scripts/my-first-script` request returns the following fields: } ```` -## Response fields +## Response body fields The `GET _scripts/my-first-script` request returns the following response fields: diff --git a/_api-reference/scroll.md b/_api-reference/scroll.md index b940c90d86..770697d4e4 100644 --- a/_api-reference/scroll.md +++ b/_api-reference/scroll.md @@ -17,7 +17,32 @@ To use the `scroll` operation, add a `scroll` parameter to the request header wi Because search contexts consume a lot of memory, we suggest you don't use the `scroll` operation for frequent user queries. Instead, use the `sort` parameter with the `search_after` parameter to scroll responses for user queries. {: .note } -## Example +## Path and HTTP methods + +```json +GET _search/scroll +POST _search/scroll +GET _search/scroll/ +POST _search/scroll/ +``` + +## Path parameters + +Parameter | Type | Description +:--- | :--- | :--- +scroll_id | String | The scroll ID for the search. + +## Query parameters + +All scroll parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +scroll | Time | Specifies the amount of time the search context is maintained. +scroll_id | String | The scroll ID for the search. +rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`. + +## Example requests To set the number of results that you want returned for each batch, use the `size` parameter: @@ -84,28 +109,6 @@ DELETE _search/scroll/_all The `scroll` operation corresponds to a specific timestamp. It doesn't consider documents added after that timestamp as potential results. - -## Path and HTTP methods - -``` -GET _search/scroll -POST _search/scroll -``` -``` -GET _search/scroll/ -POST _search/scroll/ -``` - -## URL parameters - -All scroll parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- -scroll | Time | Specifies the amount of time the search context is maintained. -scroll_id | String | The scroll ID for the search. -rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`. - ## Example response ```json diff --git a/_api-reference/search.md b/_api-reference/search.md index 777f48354e..df6992912e 100644 --- a/_api-reference/search.md +++ b/_api-reference/search.md @@ -12,33 +12,19 @@ redirect_from: The Search API operation lets you execute a search request to search your cluster for data. -## Example - -```json -GET /movies/_search -{ - "query": { - "match": { - "text_entry": "I am the night" - } - } -} -``` -{% include copy-curl.html %} - ## Path and HTTP Methods -``` -GET //_search +```json +GET //_search GET /_search -POST //_search +POST //_search POST /_search ``` -## URL Parameters +## Query parameters -All URL parameters are optional. +All parameters are optional. Parameter | Type | Description :--- | :--- | :--- @@ -124,7 +110,22 @@ terminate_after | Integer | The maximum number of documents OpenSearch should pr timeout | Time | How long to wait for a response. Default is no timeout. version | Boolean | Whether to include the document version in the response. -## Response body +## Example request + +```json +GET /movies/_search +{ + "query": { + "match": { + "text_entry": "I am the night" + } + } +} +``` +{% include copy-curl.html %} + + +## Response body fields ```json { diff --git a/_api-reference/snapshots/cleanup-snapshot-repository.md b/_api-reference/snapshots/cleanup-snapshot-repository.md new file mode 100644 index 0000000000..18e8f35f81 --- /dev/null +++ b/_api-reference/snapshots/cleanup-snapshot-repository.md @@ -0,0 +1,63 @@ +--- +layout: default +title: Cleanup Snapshot Repository +parent: Snapshot APIs +nav_order: 11 +--- + +# Cleanup Snapshot Repository +Introduced 1.0 +{: .label .label-purple } + +The Cleanup Snapshot Repository API clears a snapshot repository of data no longer referenced by any existing snapshot. + +## Path and HTTP methods + +```json +POST /_snapshot//_cleanup +``` + + +## Path parameters + +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `repository` | String | The name of the snapshot repository. | + +## Query parameters + +The following table lists the available query parameters. All query parameters are optional. + +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `cluster_manager_timeout` | Time | The amount of time to wait for a response from the cluster manager node. Formerly called `master_timeout`. Optional. Default is 30 seconds. | +| `timeout` | Time | The amount of time to wait for the operation to complete. Optional.| + +## Example request + +The following request removes all stale data from the repository `my_backup`: + +```json +POST /_snapshot/my_backup/_cleanup +``` +{% include copy-curl.html %} + + +## Example response + +```json +{ + "results":{ + "deleted_bytes":40, + "deleted_blobs":8 + } +} +``` + +## Response body fields + +| Field | Data type | Description | +| :--- | :--- | :--- | +| `deleted_bytes` | Integer | The number of bytes made available in the snapshot after data deletion. | +| `deleted_blobs` | Integer | The number of binary large objects (BLOBs) cleared from the repository by the request. | + diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md index ca4c04114c..40a35973e8 100644 --- a/_api-reference/snapshots/create-repository.md +++ b/_api-reference/snapshots/create-repository.md @@ -21,9 +21,9 @@ For instructions on creating a repository, see [Register repository]({{site.url} ## Path and HTTP methods -``` -POST /_snapshot/my-first-repo/ -PUT /_snapshot/my-first-repo/ +```json +POST /_snapshot// +PUT /_snapshot// ``` ## Path parameters @@ -38,11 +38,20 @@ Request parameters depend on the type of repository: `fs` or `s3`. ### Common parameters -The following table lists parameters that can be used with both the `fs` and `s3` repositories. +The following table lists parameters that can be used with both the `fs` and `s3` repositories. Request field | Description :--- | :--- `prefix_mode_verification` | When enabled, adds a hashed value of a random seed to the prefix for repository verification. For remote-store-enabled clusters, you can add the `setting.prefix_mode_verification` setting to the node attributes for the supplied repository. This field works with both new and existing repositories. Optional. +`shard_path_type` | Controls the path structure of shard-level blobs. Supported values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`. For more information about each value, see [shard_path_type values](#shard_path_type-values)/. Default is `FIXED`. Optional. + +#### shard_path_type values + +The following values are supported in the `shard_path_type` setting: + +- `FIXED`: Keeps the path structure in the existing hierarchical manner, such as `//indices//0/`. +- `HASHED_PREFIX`: Prepends a hashed prefix at the start of the path for each unique shard ID, for example, `///indices//0/`. +- `HASHED_INFIX`: Appends a hashed prefix after the base path for each unique shard ID, for example, `///indices//0/`. The hash method used is `FNV_1A_COMPOSITE_1`, which uses the `FNV1a` hash function and generates a custom-encoded 64-bit hash value that scales well with most remote store options. `FNV1a` takes the most significant 6 bits to create a URL-safe Base64 character and the next 14 bits to create a binary string. ### fs repository @@ -54,6 +63,7 @@ Request field | Description `max_restore_bytes_per_sec` | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional. `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. `remote_store_index_shallow_copy` | Boolean | Determines whether the snapshot of the remote store indexes are captured as a shallow copy. Default is `false`. +`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. @@ -73,6 +83,7 @@ Request field | Description `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. `remote_store_index_shallow_copy` | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. +`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]([shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`. `server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is `false`. Optional. `storage_class` | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional. diff --git a/_api-reference/snapshots/create-snapshot.md b/_api-reference/snapshots/create-snapshot.md index d4c9ef8219..45e5a28b55 100644 --- a/_api-reference/snapshots/create-snapshot.md +++ b/_api-reference/snapshots/create-snapshot.md @@ -26,7 +26,7 @@ POST /_snapshot// Parameter | Data type | Description :--- | :--- | :--- -repository | String | Repostory name to contain the snapshot. | +repository | String | Repository name to store the snapshot. | snapshot | String | Name of Snapshot to create. | ## Query parameters @@ -35,7 +35,7 @@ Parameter | Data type | Description :--- | :--- | :--- wait_for_completion | Boolean | Whether to wait for snapshot creation to complete before continuing. If you include this parameter, the snapshot definition is returned after completion. | -## Request fields +## Request body fields The request body is optional. @@ -48,7 +48,7 @@ Field | Data type | Description ## Example requests -##### Request without a body +### Request without a body The following request creates a snapshot called `my-first-snapshot` in an S3 repository called `my-s3-repository`. A request body is not included because it is optional. @@ -57,7 +57,7 @@ POST _snapshot/my-s3-repository/my-first-snapshot ``` {% include copy-curl.html %} -##### Request with a body +### Request with a body You can also add a request body to include or exclude certain indices or specify other settings: @@ -87,7 +87,7 @@ Upon success, the response content depends on whether you include the `wait_for_ To verify that the snapshot was created, use the [Get snapshot]({{site.url}}{{site.baseurl}}/api-reference/snapshots/get-snapshot) API, passing the snapshot name as the `snapshot` path parameter. {: .note} -##### `wait_for_completion` included +### `wait_for_completion` included The snapshot definition is returned. @@ -125,7 +125,7 @@ The snapshot definition is returned. } ``` -#### Response fields +## Response body fields | Field | Data type | Description | | :--- | :--- | :--- | @@ -144,4 +144,5 @@ The snapshot definition is returned. | failures | array | Failures, if any, that occured during snapshot creation. | | shards | object | Total number of shards created along with number of successful and failed shards. | | state | string | Snapshot status. Possible values: `IN_PROGRESS`, `SUCCESS`, `FAILED`, `PARTIAL`. | -| remote_store_index_shallow_copy | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. | \ No newline at end of file +| remote_store_index_shallow_copy | Boolean | Whether the snapshots of the remote store indexes is captured as a shallow copy. Default is `false`. | +| pinned_timestamp | long | A timestamp (in milliseconds) pinned by the snapshot for the implicit locking of remote store files referenced by the snapshot. | \ No newline at end of file diff --git a/_api-reference/snapshots/delete-snapshot-repository.md b/_api-reference/snapshots/delete-snapshot-repository.md index 1fadc21207..2649c3c90d 100644 --- a/_api-reference/snapshots/delete-snapshot-repository.md +++ b/_api-reference/snapshots/delete-snapshot-repository.md @@ -9,11 +9,17 @@ nav_order: 3 **Introduced 1.0** {: .label .label-purple } - Deletes a snapshot repository configuration. +Deletes a snapshot repository configuration. - A repository in OpenSearch is simply a configuration that maps a repository name to a type (file system or s3 repository) along with other information depending on the type. The configuration is backed by a file system location or an s3 bucket. When you invoke the API, the physical file system or s3 bucket itself is not deleted. Only the configuration is deleted. +A repository in OpenSearch is simply a configuration that maps a repository name to a type (file system or s3 repository) along with other information depending on the type. The configuration is backed by a file system location or an s3 bucket. When you invoke the API, the physical file system or s3 bucket itself is not deleted. Only the configuration is deleted. - To learn more about repositories, see [Register or update snapshot repository]({{site.url}}{{site.baseurl}}/api-reference/snapshots/create-repository). +To learn more about repositories, see [Register or update snapshot repository]({{site.url}}{{site.baseurl}}/api-reference/snapshots/create-repository). + +## Path and HTTP methods + +```json +DELETE _snapshot/ +``` ## Path parameters diff --git a/_api-reference/snapshots/delete-snapshot.md b/_api-reference/snapshots/delete-snapshot.md index d231adf74a..faed3b92d0 100644 --- a/_api-reference/snapshots/delete-snapshot.md +++ b/_api-reference/snapshots/delete-snapshot.md @@ -17,11 +17,17 @@ Deletes a snapshot from a repository. * To view a list of your snapshots, see [cat snapshots]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-snapshots). +## Path and HTTP method + +```json +DELETE _snapshot// +``` + ## Path parameters Parameter | Data type | Description :--- | :--- | :--- -repository | String | Repostory that contains the snapshot. | +repository | String | Repository that contains the snapshot. | snapshot | String | Snapshot to delete. | ## Example request diff --git a/_api-reference/snapshots/get-snapshot-repository.md b/_api-reference/snapshots/get-snapshot-repository.md index 6617106059..1098cd544a 100644 --- a/_api-reference/snapshots/get-snapshot-repository.md +++ b/_api-reference/snapshots/get-snapshot-repository.md @@ -16,6 +16,12 @@ To learn more about repositories, see [Register repository]({{site.url}}{{site.b You can also get details about a snapshot during and after snapshot creation. See [Get snapshot status]({{site.url}}{{site.baseurl}}/api-reference/snapshots/get-snapshot-status/). {: .note} +## Path and HTTP methods + +```json +GET /_snapshot/ +``` + ## Path parameters | Parameter | Data type | Description | @@ -54,7 +60,7 @@ Upon success, the response returns repositry information. This sample is for an } ```` -## Response fields +## Response body fields | Field | Data type | Description | | :--- | :--- | :--- | diff --git a/_api-reference/snapshots/get-snapshot-status.md b/_api-reference/snapshots/get-snapshot-status.md index c7f919bcb3..8675c23886 100644 --- a/_api-reference/snapshots/get-snapshot-status.md +++ b/_api-reference/snapshots/get-snapshot-status.md @@ -16,14 +16,21 @@ To learn about snapshot creation, see [Create snapshot]({{site.url}}{{site.baseu If you use the Security plugin, you must have the `monitor_snapshot`, `create_snapshot`, or `manage cluster` privileges. {: .note} +## Path and HTTP methods + +```json +GET _snapshot///_status +``` + ## Path parameters Path parameters are optional. | Parameter | Data type | Description | :--- | :--- | :--- -| repository | String | Repository containing the snapshot. | -| snapshot | String | Snapshot to return. | +| repository | String | The repository containing the snapshot. | +| snapshot | List | The snapshot(s) to return. | +| index | List | The indexes to include in the response. | Three request variants provide flexibility: @@ -31,16 +38,23 @@ Three request variants provide flexibility: * `GET _snapshot//_status` returns all currently running snapshots in the specified repository. This is the preferred variant. -* `GET _snapshot///_status` returns detailed status information for a specific snapshot in the specified repository, regardless of whether it's currently running or not. +* `GET _snapshot///_status` returns detailed status information for a specific snapshot(s) in the specified repository, regardless of whether it's currently running. + +* `GET /_snapshot////_status` returns detailed status information only for the specified indexes in a specific snapshot in the specified repository. Note that this endpoint works only for indexes belonging to a specific snapshot. + +Snapshot API calls only work if the total number of shards across the requested resources, such as snapshots and indexes created from snapshots, is smaller than the limit specified by the following cluster setting: + +- `snapshot.max_shards_allowed_in_status_api`(Dynamic, integer): The maximum number of shards that can be included in the Snapshot Status API response. Default value is `200000`. Not applicable for [shallow snapshots v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability##shallow-snapshot-v2), where the total number and sizes of files are returned as 0. + -Using the API to return state for other than currently running snapshots can be very costly for (1) machine machine resources and (2) processing time if running in the cloud. For each snapshot, each request causes file reads from all a snapshot's shards. +Using the API to return the state of snapshots that are not currently running can be very costly in terms of both machine resources and processing time when querying data in the cloud. For each snapshot, each request causes a file read of all of the snapshot's shards. {: .warning} -## Request fields +## Request body fields | Field | Data type | Description | :--- | :--- | :--- -| ignore_unavailable | Boolean | How to handles requests for unavailable snapshots. If `false`, the request returns an error for unavailable snapshots. If `true`, the request ignores unavailable snapshots, such as those that are corrupted or temporarily cannot be returned. Defaults to `false`.| +| ignore_unavailable | Boolean | How to handle requests for unavailable snapshots and indexes. If `false`, the request returns an error for unavailable snapshots and indexes. If `true`, the request ignores unavailable snapshots and indexes, such as those that are corrupted or temporarily cannot be returned. Default is `false`.| ## Example request @@ -369,31 +383,31 @@ The `GET _snapshot/my-opensearch-repo/my-first-snapshot/_status` request returns } ```` -## Response fields +## Response body fields | Field | Data type | Description | :--- | :--- | :--- | repository | String | Name of repository that contains the snapshot. | | snapshot | String | Snapshot name. | -| uuid | String | Snapshot Universally unique identifier (UUID). | +| uuid | String | A snapshot's universally unique identifier (UUID). | | state | String | Snapshot's current status. See [Snapshot states](#snapshot-states). | | include_global_state | Boolean | Whether the current cluster state is included in the snapshot. | | shards_stats | Object | Snapshot's shard counts. See [Shard stats](#shard-stats). | -| stats | Object | Details of files included in the snapshot. `file_count`: number of files. `size_in_bytes`: total of all fie sizes. See [Snapshot file stats](#snapshot-file-stats). | +| stats | Object | Information about files included in the snapshot. `file_count`: number of files. `size_in_bytes`: total size of all files. See [Snapshot file stats](#snapshot-file-stats). | | index | list of Objects | List of objects that contain information about the indices in the snapshot. See [Index objects](#index-objects).| -##### Snapshot states +### Snapshot states | State | Description | :--- | :--- | -| FAILED | The snapshot terminated in an error and no data was stored. | +| FAILED | The snapshot terminated in an error and no data was stored. | | IN_PROGRESS | The snapshot is currently running. | | PARTIAL | The global cluster state was stored, but data from at least one shard was not stored. The `failures` property of the [Create snapshot]({{site.url}}{{site.baseurl}}/api-reference/snapshots/create-snapshot) response contains additional details. | | SUCCESS | The snapshot finished and all shards were stored successfully. | -##### Shard stats +### Shard stats -All property values are Integers. +All property values are integers. | Property | Description | :--- | :--- | @@ -404,7 +418,7 @@ All property values are Integers. | failed | Number of shards that failed to be included in the snapshot. | | total | Total number of shards included in the snapshot. | -##### Snapshot file stats +### Snapshot file stats | Property | Type | Description | :--- | :--- | :--- | @@ -414,10 +428,10 @@ All property values are Integers. | start_time_in_millis | Long | Time (in milliseconds) when snapshot creation began. | | time_in_millis | Long | Total time (in milliseconds) that the snapshot took to complete. | -##### Index objects +### Index objects | Property | Type | Description | :--- | :--- | :--- | | shards_stats | Object | See [Shard stats](#shard-stats). | | stats | Object | See [Snapshot file stats](#snapshot-file-stats). | -| shards | list of Objects | List of objects containing information about the shards that include the snapshot. OpenSearch returns the following properties about the shards.

**stage**: Current state of shards in the snapshot. Shard states are:

* DONE: Number of shards in the snapshot that were successfully stored in the repository.

* FAILURE: Number of shards in the snapshot that were not successfully stored in the repository.

* FINALIZE: Number of shards in the snapshot that are in the finalizing stage of being stored in the repository.

* INIT: Number of shards in the snapshot that are in the initializing stage of being stored in the repository.

* STARTED: Number of shards in the snapshot that are in the started stage of being stored in the repository.

**stats**: See [Snapshot file stats](#snapshot-file-stats).

**total**: Total number and size of files referenced by the snapshot.

**start_time_in_millis**: Time (in milliseconds) when snapshot creation began.

**time_in_millis**: Total time (in milliseconds) that the snapshot took to complete. | +| shards | List of objects | Contains information about the shards included in the snapshot. OpenSearch returns the following properties about the shard:

**stage**: The current state of shards in the snapshot. Shard states are:

* DONE: The number of shards in the snapshot that were successfully stored in the repository.

* FAILURE: The number of shards in the snapshot that were not successfully stored in the repository.

* FINALIZE: The number of shards in the snapshot that are in the finalizing stage of being stored in the repository.

* INIT: The number of shards in the snapshot that are in the initializing stage of being stored in the repository.

* STARTED: The number of shards in the snapshot that are in the started stage of being stored in the repository.

**stats**: See [Snapshot file stats](#snapshot-file-stats).

**total**: The total number and sizes of files referenced by the snapshot.

**start_time_in_millis**: The time (in milliseconds) when snapshot creation began.

**time_in_millis**: The total amount of time (in milliseconds) that the snapshot took to complete. | diff --git a/_api-reference/snapshots/get-snapshot.md b/_api-reference/snapshots/get-snapshot.md index ac55c0370f..148f9e8ff2 100644 --- a/_api-reference/snapshots/get-snapshot.md +++ b/_api-reference/snapshots/get-snapshot.md @@ -11,6 +11,12 @@ nav_order: 6 Retrieves information about a snapshot. +## Path and HTTP methods + +```json +GET _snapshot/// +``` + ## Path parameters | Parameter | Data type | Description | @@ -73,7 +79,7 @@ Upon success, the response returns snapshot information: ] } ```` -## Response fields +## Response body fields | Field | Data type | Description | | :--- | :--- | :--- | diff --git a/_api-reference/snapshots/restore-snapshot.md b/_api-reference/snapshots/restore-snapshot.md index cdb9948c28..b22c371134 100644 --- a/_api-reference/snapshots/restore-snapshot.md +++ b/_api-reference/snapshots/restore-snapshot.md @@ -19,11 +19,17 @@ Restores a snapshot of a cluster or specified data streams and indices. If open indexes with the same name that you want to restore already exist in the cluster, you must close, delete, or rename the indexes. See [Example request](#example-request) for information about renaming an index. See [Close index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/close-index) for information about closing an index. {: .note} +## Path and HTTP methods + +```json +GET _snapshot/// +``` + ## Path parameters | Parameter | Data type | Description | :--- | :--- | :--- -repository | String | Repository containing the snapshot to restore. | +| repository | String | Repository containing the snapshot to restore. | | snapshot | String | Snapshot to restore. | ## Query parameters @@ -32,7 +38,7 @@ Parameter | Data type | Description :--- | :--- | :--- wait_for_completion | Boolean | Whether to wait for snapshot restoration to complete before continuing. | -### Request fields +## Request body fields All request body parameters are optional. @@ -45,8 +51,10 @@ All request body parameters are optional. | index_settings | String | A comma-delimited list of settings to add or change in all restored indices. Use this parameter to override index settings during snapshot restoration. For data streams, these index settings are applied to the restored backing indices. | | indices | String | A comma-delimited list of data streams and indices to restore from the snapshot. Multi-index syntax is supported. By default, a restore operation includes all data streams and indices in the snapshot. If this argument is provided, the restore operation only includes the data streams and indices that you specify. | | partial | Boolean | How the restore operation will behave if indices in the snapshot do not have all primary shards available. If `false`, the entire restore operation fails if any indices in the snapshot do not have all primary shards available.

If `true`, allows the restoration of a partial snapshot of indices with unavailable shards. Only shards that were successfully included in the snapshot are restored. All missing shards are recreated as empty. By default, the entire restore operation fails if one or more indices included in the snapshot do not have all primary shards available. To change this behavior, set `partial` to `true`. Defaults to `false`. | -| rename_pattern | String | The pattern to apply to restored data streams and indices. Data streams and indices matching the rename pattern will be renamed according to `rename_replacement`.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

The request fails if two or more data streams or indices are renamed into the same name. If you rename a restored data stream, its backing indices are also renamed. For example, if you rename the logs data stream to `recovered-logs`, the backing index `.ds-logs-1` is renamed to `.ds-recovered-logs-1`.

If you rename a restored stream, ensure an index template matches the new stream name. If there are no matching index template names, the stream cannot roll over and new backing indices are not created.| -| rename_replacement | String | The rename replacement string. See `rename_pattern` for more information.| +| rename_pattern | String | The pattern to apply to the restored data streams and indexes. Data streams and indexes matching the rename pattern will be renamed according to the `rename_replacement` setting.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

The request fails if two or more data streams or indexes are renamed to the same name. If you rename a restored data stream, its backing indexes are also renamed. For example, if you rename the logs data stream to `recovered-logs`, the backing index `.ds-logs-1` is renamed to `.ds-recovered-logs-1`.

If you rename a restored stream, ensure an index template matches the new stream name. If there are no matching index template names, the stream cannot roll over, and new backing indexes are not created.| +| rename_replacement | String | The rename replacement string.| +| rename_alias_pattern | String | The pattern to apply to the restored aliases. Aliases matching the rename pattern will be renamed according to the `rename_alias_replacement` setting.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

If two or more aliases are renamed to the same name, these aliases will be merged into one.| +| rename_alias_replacement | String | The rename replacement string for aliases.| | source_remote_store_repository | String | The name of the remote store repository of the source index being restored. If not provided, the Snapshot Restore API will use the repository that was registered when the snapshot was created. | wait_for_completion | Boolean | Whether to return a response after the restore operation has completed. If `false`, the request returns a response when the restore operation initializes. If `true`, the request returns a response when the restore operation completes. Defaults to `false`. | @@ -92,7 +100,7 @@ Upon success, the response returns the following JSON object: ```` Except for the snapshot name, all properties are empty or `0`. This is because any changes made to the volume after the snapshot was generated are lost. However, if you invoke the [Get snapshot]({{site.url}}{{site.baseurl}}/api-reference/snapshots/get-snapshot) API to examine the snapshot, a fully populated snapshot object is returned. -## Response fields +## Response body fields | Field | Data type | Description | | :--- | :--- | :--- | @@ -117,4 +125,4 @@ If open indices in a snapshot already exist in a cluster, and you don't delete, }, "status" : 500 } -```` \ No newline at end of file +```` diff --git a/_api-reference/snapshots/verify-snapshot-repository.md b/_api-reference/snapshots/verify-snapshot-repository.md index e5e6337196..67a006e709 100644 --- a/_api-reference/snapshots/verify-snapshot-repository.md +++ b/_api-reference/snapshots/verify-snapshot-repository.md @@ -17,6 +17,12 @@ If verification is successful, the verify snapshot repository API returns a list If you use the Security plugin, you must have the `manage cluster` privilege. {: .note} +## Path and HTTP methods + +```json +GET _snapshot// +``` + ## Path parameters Path parameters are optional. @@ -70,7 +76,7 @@ In the preceding sample, one node is connected to the snapshot repository. If mo } ```` -## Response fields +## Response body fields | Field | Data type | Description | :--- | :--- | :--- diff --git a/_api-reference/tasks.md b/_api-reference/tasks.md index e4ca0b6049..477e720d22 100644 --- a/_api-reference/tasks.md +++ b/_api-reference/tasks.md @@ -12,9 +12,9 @@ redirect_from: A task is any operation you run in a cluster. For example, searching your data collection of books for a title or author name is a task. When you run OpenSearch, a task is automatically created to monitor your cluster's health and performance. For more information about all of the tasks currently executing in your cluster, you can use the `tasks` API operation. -The following request returns information about all of your tasks: +## Path and HTTP methods -``` +```json GET _tasks ``` {% include copy-curl.html %} @@ -28,6 +28,7 @@ GET _tasks/ Note that if a task finishes running, it won't be returned as part of your request. For an example of a task that takes a little longer to finish, you can run the [`_reindex`]({{site.url}}{{site.baseurl}}/opensearch/reindex-data) API operation on a larger document, and then run `tasks`. +## Query parameters You can also use the following parameters with your query. @@ -42,7 +43,6 @@ Parameter | Data type | Description | `timeout` | Time | An explicit operation timeout. (Default: 30 seconds) `cluster_manager_timeout` | Time | The time to wait for a connection to the primary node. (Default: 30 seconds) -For example, this request returns tasks currently running on a node named `opensearch-node1`: ## Example requests diff --git a/_api-reference/validate.md b/_api-reference/validate.md new file mode 100644 index 0000000000..6e1470a505 --- /dev/null +++ b/_api-reference/validate.md @@ -0,0 +1,205 @@ +--- +layout: default +title: Validate Query +nav_order: 87 +--- + +# Validate Query + +You can use the Validate Query API to validate a query without running it. The query can be sent as a path parameter or included in the request body. + +## Path and HTTP methods + +The Validate Query API contains the following path: + +```json +GET /_validate/query +``` + +## Path parameters + +All path parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`index` | String | The index to validate the query against. If you don't specify an index or multiple indexes as part of the URL (or want to override the URL value for an individual search), you can include it here. Examples include `"logs-*"` and `["my-store", "sample_data_ecommerce"]`. +`query` | Query object | The query using [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/). + +## Query parameters + +The following table lists the available query parameters. All query parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`all_shards` | Boolean | When `true`, validation is run against [all shards](#rewrite-and-all_shards) instead of against one shard per index. Default is `false`. +`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +allow_partial_search_results | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`. +`analyzer` | String | The analyzer to use in the query string. This should only be used with the `q` option. +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. +`default_operator` | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. +`df` | String | The default field if a field prefix is not provided in the query string. +`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. +`explain` | Boolean | Whether to return information about how OpenSearch computed the [document's score](#explain). Default is `false`. +`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response and ignores unavailable shards during the search request. Default is `false`. +`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, as a result of querying a text field for an integer). Default is `false`. +`rewrite` | Determines how OpenSearch [rewrites](#rewrite) and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. +`q` | String | A query in the Lucene string syntax. + +## Example request + +The following example request uses an index named `Hamlet` created using a `bulk` request: + +```json +PUT hamlet/_bulk?refresh +{"index":{"_id":1}} +{"user" : { "id": "hamlet" }, "@timestamp" : "2099-11-15T14:12:12", "message" : "To Search or Not To Search"} +{"index":{"_id":2}} +{"user" : { "id": "hamlet" }, "@timestamp" : "2099-11-15T14:12:13", "message" : "My dad says that I'm such a ham."} +``` +{% include copy.html %} + +You can then use the Validate Query API to validate an index query, as shown in the following example: + +```json +GET hamlet/_validate/query?q=user.id:hamlet +``` +{% include copy.html %} + +The query can also be sent as a request body, as shown in the following example: + +```json +GET hamlet/_validate/query +{ + "query" : { + "bool" : { + "must" : { + "query_string" : { + "query" : "*:*" + } + }, + "filter" : { + "term" : { "user.id" : "hamlet" } + } + } + } +} +``` +{% include copy.html %} + + +## Example responses + +If the query passes validation, then the response indicates that the query is `true`, as shown in the following example response, where the `valid` parameter is `true`: + +``` +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "valid": true +} +``` + +If the query does not pass validation, then OpenSearch responds that the query is `false`. The following example request query includes a dynamic mapping not configured in the `hamlet` index: + +```json +GET hamlet/_validate/query +{ + "query": { + "query_string": { + "query": "@timestamp:foo", + "lenient": false + } + } +} +``` + +OpenSearch responds with the following, where the `valid` parameter is `false`: + +``` +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "valid": false +} +``` + +Certain query parameters can also affect what is included in the response. The following examples show how the [Explain](#explain), [Rewrite](#rewrite), and [all_shards](#rewrite-and-all_shards) query options affect the response. + +### Explain + +The `explain` option returns information about the query failure in the `explanations` field, as shown in the following example response: + +``` +{ + "valid" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "explanations" : [ { + "index" : "_shakespeare", + "valid" : false, + "error" : "shakespeare/IAEc2nIXSSunQA_suI0MLw] QueryShardException[failed to create query:...failed to parse date field [foo]" + } ] +} +``` + + +### Rewrite + +When the `rewrite` option is set to `true` in the request, the `explanations` option shows the Lucene query that is executed as a string, as shown in the following response: + +``` +{ + "valid": true, + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "explanations": [ + { + "index": "", + "valid": true, + "explanation": "((user:hamlet^4.256753 play:hamlet^6.863601 play:romeo^2.8415773 plot:puck^3.4193945 plot:othello^3.8244398 ... )~4) -ConstantScore(_id:2) #(ConstantScore(_type:_doc))^0.0" + } + ] +} +``` + + +### Rewrite and all_shards + +When both the `rewrite` and `all_shards` options are set to `true`, the Validate Query API responds with detailed information from all available shards as opposed to only one shard (the default), as shown in the following response: + +``` +{ + "valid": true, + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "explanations": [ + { + "index": "my-index-000001", + "shard": 0, + "valid": true, + "explanation": "(user.id:hamlet)^0.6333333" + } + ] +} +``` + + + + + + diff --git a/_automating-configurations/api/create-workflow.md b/_automating-configurations/api/create-workflow.md index 83c0110ac3..610bfe8fab 100644 --- a/_automating-configurations/api/create-workflow.md +++ b/_automating-configurations/api/create-workflow.md @@ -58,7 +58,7 @@ POST /_plugins/_flow_framework/workflow?validation=none ``` {% include copy-curl.html %} -You cannot update a full workflow once it has been provisioned, but you can update fields other than the `workflows` field, such as `name` and `description`: +In a workflow that has not been provisioned, you can update fields other than the `workflows` field. For example, you can update the `name` and `description` fields as follows: ```json PUT /_plugins/_flow_framework/workflow/?update_fields=true @@ -72,16 +72,40 @@ PUT /_plugins/_flow_framework/workflow/?update_fields=true You cannot specify both the `provision` and `update_fields` parameters at the same time. {: .note} +If a workflow has been provisioned, you can update and reprovision the full template: + +```json +PUT /_plugins/_flow_framework/workflow/?reprovision=true +{ + +} +``` + +You can add new steps to the workflow but cannot delete them. Only index setting, search pipeline, and ingest pipeline steps can currently be updated. +{: .note} + +You can create and provision a workflow using a [workflow template]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-templates/) as follows: + +```json +POST /_plugins/_flow_framework/workflow?use_case=&provision=true +{ + "create_connector.credential.key" : "" +} +``` +{% include copy-curl.html %} + The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`. | Parameter | Data type | Description | | :--- | :--- | :--- | | `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. | | `update_fields` | Boolean | Whether to update only the fields included in the request body. Default is `false`. | +| `reprovision` | Boolean | Whether to reprovision the entire template if it has already been provisioned. A complete template must be provided in the request body. Default is `false`. | | `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. | +| `use_case` | String | The name of the [workflow template]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-templates/#supported-workflow-templates) to use when creating the workflow. | | User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). | -## Request fields +## Request body fields The following table lists the available request fields. @@ -89,7 +113,7 @@ The following table lists the available request fields. |:--- |:--- |:--- |:--- | |`name` |String |Required |The name of the workflow. | |`description` |String |Optional |A description of the workflow. | -|`use_case` |String |Optional | A use case, which can be used with the Search Workflow API to find related workflows. In the future, OpenSearch may provide some standard use cases to ease categorization, but currently you can use this field to specify custom values. | +|`use_case` |String |Optional | A user-provided use case, which can be used with the [Search Workflow API]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow/) to find related workflows. You can use this field to specify custom values. This is distinct from the `use_case` query parameter. | |`version` |Object |Optional | A key-value map with two fields: `template`, which identifies the template version, and `compatibility`, which identifies a list of minimum required OpenSearch versions. | |`workflows` |Object |Optional |A map of workflows. Presently, only the `provision` key is supported. The value for the workflow key is a key-value map that includes fields for `user_params` and lists of `nodes` and `edges`. | diff --git a/_automating-configurations/api/index.md b/_automating-configurations/api/index.md index 716e19c41f..78bc4eaede 100644 --- a/_automating-configurations/api/index.md +++ b/_automating-configurations/api/index.md @@ -18,4 +18,6 @@ OpenSearch supports the following workflow APIs: * [Search workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow/) * [Search workflow state]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow-state/) * [Deprovision workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/deprovision-workflow/) -* [Delete workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/delete-workflow/) \ No newline at end of file +* [Delete workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/delete-workflow/) + +For information about workflow access control, see [Workflow template security]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-security/). \ No newline at end of file diff --git a/_automating-configurations/index.md b/_automating-configurations/index.md index 144ad445c8..68742f6149 100644 --- a/_automating-configurations/index.md +++ b/_automating-configurations/index.md @@ -44,3 +44,4 @@ Workflow automation provides the following benefits: - For the workflow step syntax, see [Workflow steps]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-steps/). - For a complete example, see [Workflow tutorial]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-tutorial/). - For configurable settings, see [Workflow settings]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-settings/). +- For information about workflow access control, see [Workflow template security]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-security/). \ No newline at end of file diff --git a/_automating-configurations/workflow-security.md b/_automating-configurations/workflow-security.md new file mode 100644 index 0000000000..f3a3d7eeb9 --- /dev/null +++ b/_automating-configurations/workflow-security.md @@ -0,0 +1,93 @@ +--- +layout: default +title: Workflow template security +nav_order: 50 +--- + +# Workflow template security + +In OpenSearch, automated workflow configurations are provided by the Flow Framework plugin. You can use the Security plugin together with the Flow Framework plugin to limit non-admin users to specific actions. For example, you might want some users to only be able to create, update, or delete workflows, while others may only be able to view workflows. + +All Flow Framework indexes are protected as system indexes. Only a superadmin user or an admin user with a TLS certificate can access system indexes. For more information, see [System indexes]({{site.url}}{{site.baseurl}}/security/configuration/system-indices/). + +Security for Flow Framework is set up similarly to [security for anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/security/). + +## Basic permissions + +As an admin user, you can use the Security plugin to assign specific permissions to users based on the APIs they need to access. For a list of supported Flow Framework APIs, see [Workflow APIs]({{site.url}}{{site.baseurl}}/automating-configurations/api/index/). + +The Security plugin has two built-in roles that cover most Flow Framework use cases: `flow_framework_full_access` and `flow_framework_read_access`. For descriptions of each, see [Predefined roles]({{site.url}}{{site.baseurl}}/security/access-control/users-roles#predefined-roles). + +If these roles don't meet your needs, you can assign users individual Flow Framework [permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/flow_framework/workflow/search` permission lets you search workflows. + +### Fine-grained access control + +To reduce the chances of unintended users viewing metadata that describes an index, we recommend that administrators enable role-based access control when assigning permissions to the intended user group. For more information, see [Limit access by backend role](#advanced-limit-access-by-backend-role). + +## (Advanced) Limit access by backend role + +Use backend roles to configure fine-grained access to individual workflows based on roles. For example, users in different departments of an organization can view workflows owned by their own department. + +First, make sure your users have the appropriate [backend roles]({{site.url}}{{site.baseurl}}/security/access-control/index/). Backend roles usually come from an [LDAP server]({{site.url}}{{site.baseurl}}/security/configuration/ldap/) or [SAML provider]({{site.url}}{{site.baseurl}}/security/configuration/saml/), but if you use an internal user database, you can [create users manually using the API]({{site.url}}{{site.baseurl}}/security/access-control/api#create-user). + +Next, enable the following setting: + +```json +PUT _cluster/settings +{ + "transient": { + "plugins.flow_framework.filter_by_backend_roles": "true" + } +} +``` +{% include copy-curl.html %} + +Now when users view workflow resources in OpenSearch Dashboards (or make REST API calls), they only see workflows created by users who share at least one backend role. + +For example, consider two users: `alice` and `bob`. + +`alice` has an `analyst` backend role: + +```json +PUT _plugins/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": [ + "analyst" + ], + "attributes": {} +} +``` + +`bob` has a `human-resources` backend role: + +```json +PUT _plugins/_security/api/internalusers/bob +{ + "password": "bob", + "backend_roles": [ + "human-resources" + ], + "attributes": {} +} +``` + +Both `alice` and `bob` have full access to the Flow Framework APIs: + +```json +PUT _plugins/_security/api/rolesmapping/flow_framework_full_access +{ + "backend_roles": [], + "hosts": [], + "users": [ + "alice", + "bob" + ] +} +``` + +Because they have different backend roles, `alice` and `bob` cannot view each other's workflows or their results. + +Users without backend roles can still view other users' workflow results if they have `flow_framework_read_access`. This also applies to users who have `flow_framework_full_access` because this permission includes all of the permissions of `flow_framework_read_access`. + +Administrators should inform users that the `flow_framework_read_access` permission allows them to view the results of any workflow in a cluster, including data not directly accessible to them. To limit access to the results of a specific workflow, administrators should apply backend role filters when creating the workflow. This ensures that only users with matching backend roles can access that workflow's results. \ No newline at end of file diff --git a/_automating-configurations/workflow-steps.md b/_automating-configurations/workflow-steps.md index 43685a957a..0f61874be6 100644 --- a/_automating-configurations/workflow-steps.md +++ b/_automating-configurations/workflow-steps.md @@ -75,9 +75,11 @@ You can include the following additional fields in the `user_inputs` field if th You can include the following additional fields in the `previous_node_inputs` field when indicated. -|Field |Data type |Description | -|--- |--- |--- | -|`model_id` |String |The `model_id` is used as an input for several steps. As a special case for the Register Agent step type, if an `llm.model_id` field is not present in the `user_inputs` and not present in `previous_node_inputs`, the `model_id` field from the previous node may be used as a backup for the model ID. | +| Field |Data type | Description | +|-----------------|--- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `model_id` |String | The `model_id` is used as an input for several steps. As a special case for the `register_agent` step type, if an `llm.model_id` field is not present in the `user_inputs` and not present in `previous_node_inputs`, then the `model_id` field from the previous node may be used as a backup for the model ID. The `model_id` will also be included in the `parameters` input of the `create_tool` step for the `MLModelTool`. | +| `agent_id` |String | The `agent_id` is used as an input for several steps. The `agent_id` will also be included in the `parameters` input of the `create_tool` step for the `AgentTool`. | +| `connector_id` |String | The `connector_id` is used as an input for several steps. The `connector_id` will also be included in the `parameters` input of the `create_tool` step for the `ConnectorTool`. | ## Example workflow steps diff --git a/_benchmark/glossary.md b/_benchmark/glossary.md new file mode 100644 index 0000000000..a1d2335b8c --- /dev/null +++ b/_benchmark/glossary.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Glossary +nav_order: 100 +--- + +# OpenSearch Benchmark glossary + +The following terms are commonly used in OpenSearch Benchmark: + +- **Corpora**: A collection of documents. +- **Latency**: If `target-throughput` is disabled (has no value or a value of `0)`, then latency is equal to service time. If `target-throughput` is enabled (has a value of 1 or greater), then latency is equal to the service time plus the amount of time the request waits in the queue before being sent. +- **Metric keys**: The metrics stored by OpenSearch Benchmark, based on the configuration in the [metrics record]({{site.url}}{{site.baseurl}}/benchmark/metrics/metric-records/). +- **Operations**: In workloads, a list of API operations performed by a workload. +- **Pipeline**: A series of steps occurring both before and after running a workload that determines benchmark results. +- **Schedule**: A list of two or more operations performed in the order they appear when a workload is run. +- **Service time**: The amount of time taken for `opensearch-py`, the primary client for OpenSearch Benchmark, to send a request and receive a response from the OpenSearch cluster. It includes the amount of time taken for the server to process a request as well as for network latency, load balancer overhead, and deserialization/serialization. +- **Summary report**: A report generated at the end of a test based on the metric keys defined in the workload. +- **Test**: A single invocation of the OpenSearch Benchmark binary. +- **Throughput**: The number of operations completed in a given period of time. +- **Workload**: A collection of one or more benchmarking tests that use a specific document corpus to perform a benchmark against a cluster. The document corpus contains any indexes, data files, or operations invoked when the workload runs. \ No newline at end of file diff --git a/_benchmark/quickstart.md b/_benchmark/quickstart.md index a6bcd59819..9ab6d25c77 100644 --- a/_benchmark/quickstart.md +++ b/_benchmark/quickstart.md @@ -116,7 +116,7 @@ You can now run your first benchmark. The following benchmark uses the [percolat Benchmarks are run using the [`execute-test`]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) command with the following command flags: -For additional `execute_test` command flags, see the [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) reference. Some commonly used options are `--workload-params`, `--exclude-tasks`, and `--include-tasks`. +For additional `execute-test` command flags, see the [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) reference. Some commonly used options are `--workload-params`, `--exclude-tasks`, and `--include-tasks`. {: .tip} * `--pipeline=benchmark-only` : Informs OSB that users wants to provide their own OpenSearch cluster. @@ -136,7 +136,7 @@ opensearch-benchmark execute-test --pipeline=benchmark-only --workload=percolato ``` {% include copy.html %} -When the `execute_test` command runs, all tasks and operations in the `percolator` workload run sequentially. +When the `execute-test` command runs, all tasks and operations in the `percolator` workload run sequentially. ### Validating the test diff --git a/_benchmark/reference/commands/aggregate.md b/_benchmark/reference/commands/aggregate.md new file mode 100644 index 0000000000..17612f1164 --- /dev/null +++ b/_benchmark/reference/commands/aggregate.md @@ -0,0 +1,77 @@ +--- +layout: default +title: aggregate +nav_order: 85 +parent: Command reference +grand_parent: OpenSearch Benchmark Reference +redirect_from: + - /benchmark/commands/aggregate/ +--- + +# aggregate + +The `aggregate` command combines multiple test executions into a single aggregated result, providing a more streamlined way to conduct and analyze multiple test runs. There are two methods of aggregation: + +- [Auto-aggregation](#auto-aggregation) +- [Manual aggregation](#manual-aggregation) + +## Auto-aggregation + +The auto-aggregation method runs multiple iterations of benchmark tests and automatically aggregates the results, all within a single command. You can use the flags outlined in this with the `execute` command. + +### Usage + +The following example runs the `geonames` workload and aggregates the results twice: + +```bash +opensearch-benchmark execute --test-iterations=2 --aggregate=true --workload=geonames --target-hosts=127.0.0.1:9200 +``` +{% include copy-curl.html %} + +### Auto-aggregation flags + +The following new flags can be used to customize the auto-aggregation method: + +- `--test-iterations`: Specifies the number of times to run the workload (default is `1`). +- `--aggregate`: Determines whether to aggregate the results of multiple test executions (default is `true`). +- `--sleep-timer`: Specifies the number of seconds to sleep before starting the next test execution (default is `5`). +- `--cancel-on-error`: When set, stops executing tests if an error occurs in one of the test iterations (default is `false`). + +## Manual aggregation + +You can use the `aggregate` command to manually aggregate results from multiple test executions. + +### Usage + +To aggregate multiple test executions manually, specify the `test_execution_ids` you would like to aggregate, as shown in the following example: + +```bash +opensearch-benchmark aggregate --test-executions=,,... +``` +{% include copy-curl.html %} + +### Response + +OpenSearch Benchmark responds with the following: + +``` + ____ _____ __ ____ __ __ + / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ + / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ +/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< +\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| + /_/ + +Aggregate test execution ID: aggregate_results_geonames_9aafcfb8-d3b7-4583-864e-4598b5886c4f + +------------------------------- +[INFO] SUCCESS (took 1 seconds) +------------------------------- +``` + +The results will be aggregated into one test execution and stored under the ID shown in the output: + +- `--test-execution-id`: Define a unique ID for the aggregated test execution. +- `--results-file`: Write the aggregated results to the provided file. +- `--workload-repository`: Define the repository from which OpenSearch Benchmark will load workloads (default is `default`). + diff --git a/_benchmark/reference/commands/command-flags.md b/_benchmark/reference/commands/command-flags.md index 6520f80803..96948e45c7 100644 --- a/_benchmark/reference/commands/command-flags.md +++ b/_benchmark/reference/commands/command-flags.md @@ -328,3 +328,33 @@ Sets what fraction of randomized query values can be repeated. Takes values betw Sets how many distinct repeatable pair values are generated for each operation when randomization is used. Default is `5000`. This setting does not work when `--randomization-enabled` is not used. + + +## test-iterations + + +Specifies the number of times to run the workload. Default is `1`. + + +## aggregate + + +Determines whether OpenSearch Benchmark should aggregate the results of multiple test executions. + +When set to `true`, OpenSearch Benchmark will combine the results from all iterations into a single aggregated report. When set to `false`, results from each iteration will be reported separately. + +Default is `true`. + + +## sleep-timer + + +Specifies the number of seconds to sleep before starting the next test execution. Default is `5`. + + + +## cancel-on-error + + +When set, this flag instructs OpenSearch Benchmark to stop executing tests if an error occurs in one of the test iterations. Default is `false` (not set). + diff --git a/_benchmark/reference/workloads/corpora.md b/_benchmark/reference/workloads/corpora.md index 0e8d408e9a..f59e2b8a6a 100644 --- a/_benchmark/reference/workloads/corpora.md +++ b/_benchmark/reference/workloads/corpora.md @@ -49,7 +49,7 @@ Each entry in the `documents` array consists of the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- `source-file` | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. -`document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. +`document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent/child relationship, specify the number of parent documents. `base-url` | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file. `source-format` | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in `source-file`. Only `bulk` is supported. `compressed-bytes` | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads. diff --git a/_benchmark/user-guide/configuring-benchmark.md b/_benchmark/user-guide/install-and-configure/configuring-benchmark.md similarity index 98% rename from _benchmark/user-guide/configuring-benchmark.md rename to _benchmark/user-guide/install-and-configure/configuring-benchmark.md index 2be467d587..59ac13a83c 100644 --- a/_benchmark/user-guide/configuring-benchmark.md +++ b/_benchmark/user-guide/install-and-configure/configuring-benchmark.md @@ -1,10 +1,12 @@ --- layout: default -title: Configuring OpenSearch Benchmark +title: Configuring nav_order: 7 -parent: User guide +grand_parent: User guide +parent: Install and configure redirect_from: - /benchmark/configuring-benchmark/ + - /benchmark/user-guide/configuring-benchmark/ --- # Configuring OpenSearch Benchmark diff --git a/_benchmark/user-guide/install-and-configure/index.md b/_benchmark/user-guide/install-and-configure/index.md new file mode 100644 index 0000000000..c0a48278ad --- /dev/null +++ b/_benchmark/user-guide/install-and-configure/index.md @@ -0,0 +1,12 @@ +--- +layout: default +title: Install and configure +nav_order: 5 +parent: User guide +has_children: true +--- + +# Installing and configuring OpenSearch Benchmark + +This section details how to install and configure OpenSearch Benchmark. + diff --git a/_benchmark/user-guide/installing-benchmark.md b/_benchmark/user-guide/install-and-configure/installing-benchmark.md similarity index 98% rename from _benchmark/user-guide/installing-benchmark.md rename to _benchmark/user-guide/install-and-configure/installing-benchmark.md index 8383cfb2f9..1dd30f9180 100644 --- a/_benchmark/user-guide/installing-benchmark.md +++ b/_benchmark/user-guide/install-and-configure/installing-benchmark.md @@ -1,10 +1,12 @@ --- layout: default -title: Installing OpenSearch Benchmark +title: Installing nav_order: 5 -parent: User guide +grand_parent: User guide +parent: Install and configure redirect_from: - /benchmark/installing-benchmark/ + - /benchmark/user-guide/installing-benchmark/ --- # Installing OpenSearch Benchmark diff --git a/_benchmark/user-guide/distributed-load.md b/_benchmark/user-guide/optimizing-benchmarks/distributed-load.md similarity index 96% rename from _benchmark/user-guide/distributed-load.md rename to _benchmark/user-guide/optimizing-benchmarks/distributed-load.md index 60fc98500f..9729fe4362 100644 --- a/_benchmark/user-guide/distributed-load.md +++ b/_benchmark/user-guide/optimizing-benchmarks/distributed-load.md @@ -2,12 +2,12 @@ layout: default title: Running distributed loads nav_order: 15 -parent: User guide +parent: Optimizing benchmarks +grand_parent: User guide --- # Running distributed loads - OpenSearch Benchmark loads always run on the same machine on which the benchmark was started. However, you can use multiple load drivers to generate additional benchmark testing loads, particularly for large clusters on multiple machines. This tutorial describes how to distribute benchmark loads across multiple machines in a single cluster. ## System architecture @@ -64,7 +64,7 @@ With OpenSearch Benchmark running on all three nodes and the worker nodes set to On **Node 1**, run a benchmark test with the `worker-ips` set to the IP addresses for your worker nodes, as shown in the following example: ``` -opensearch-benchmark execute_test --pipeline=benchmark-only --workload=eventdata --worker-ips=198.52.100.0,198.53.100.0 --target-hosts= --client-options= --kill-running-processes +opensearch-benchmark execute-test --pipeline=benchmark-only --workload=eventdata --worker-ips=198.52.100.0,198.53.100.0 --target-hosts= --client-options= --kill-running-processes ``` After the test completes, the logs generated by the test appear on your worker nodes. diff --git a/_benchmark/user-guide/optimizing-benchmarks/index.md b/_benchmark/user-guide/optimizing-benchmarks/index.md new file mode 100644 index 0000000000..0ea6c1978e --- /dev/null +++ b/_benchmark/user-guide/optimizing-benchmarks/index.md @@ -0,0 +1,11 @@ +--- +layout: default +title: Optimizing benchmarks +nav_order: 25 +parent: User guide +has_children: true +--- + +# Optimizing benchmarks + +This section details different ways you can optimize the benchmark tools for your cluster. \ No newline at end of file diff --git a/_benchmark/user-guide/target-throughput.md b/_benchmark/user-guide/optimizing-benchmarks/target-throughput.md similarity index 79% rename from _benchmark/user-guide/target-throughput.md rename to _benchmark/user-guide/optimizing-benchmarks/target-throughput.md index 63832de595..b6c55f96c5 100644 --- a/_benchmark/user-guide/target-throughput.md +++ b/_benchmark/user-guide/optimizing-benchmarks/target-throughput.md @@ -2,7 +2,10 @@ layout: default title: Target throughput nav_order: 150 -parent: User guide +parent: Optimizing benchmarks +grand_parent: User guide +redirect_from: + - /benchmark/user-guide/target-throughput/ --- # Target throughput @@ -16,13 +19,15 @@ OpenSearch Benchmark has two testing modes, both of which are related to through ## Benchmarking mode -When you do not specify a `target-throughput`, OpenSearch Benchmark latency tests are performed in *benchmarking mode*. In this mode, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster receives a response from the previous request, OpenSearch Benchmark immediately sends the next request to the OpenSearch client. In this testing mode, latency is identical to service time. +When `target-throughput` is set to `0`, OpenSearch Benchmark latency tests are performed in *benchmarking mode*. In this mode, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster receives a response from the previous request, OpenSearch Benchmark immediately sends the next request to the OpenSearch client. In this testing mode, latency is identical to service time. + +OpenSearch Benchmark issues one request at a time per a single client. The number of clients is set by the `search-clients` setting in the workload parameters. ## Throughput-throttled mode -**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. However, users can set a `target-throughput`, which is a common workload parameter that can be set for each test and is measured in operations per second. +If the `target-throughput` is not set to `0`, then OpenSearch Benchmark issues the next request in accordance with the `target-throughput`, assuming that responses are returned instantaneously. -OpenSearch Benchmark issues one request at a time for a single-client thread, which is specified as `search-clients` in the workload parameters. If `target-throughput` is set to `0`, then OpenSearch Benchmark issues a request immediately after it receives the response from the previous request. If the `target-throughput` is not set to `0`, then OpenSearch Benchmark issues the next request in accordance with the `target-throughput`, assuming that responses are returned instantaneously. +**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses are returned instantaneously. To configure the request rate, you can set the `target-throughput` workload parameter to the desired number of operations per second for each test. When you want to simulate the type of traffic you might encounter when deploying a production cluster, set the `target-throughput` in your benchmark test to match the number of requests you estimate that the production cluster might receive. The following examples show how the `target-throughput` setting affects the latency measurement. diff --git a/_benchmark/user-guide/telemetry.md b/_benchmark/user-guide/telemetry.md deleted file mode 100644 index d4c40c790a..0000000000 --- a/_benchmark/user-guide/telemetry.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -layout: default -title: Enabling telemetry devices -nav_order: 30 -parent: User guide ---- - -Telemetry results will not appear in the summary report. To visualize telemetry results, ingest the data into OpenSearch and visualize the data in OpenSearch Dashboards. \ No newline at end of file diff --git a/_benchmark/user-guide/understanding-results/index.md b/_benchmark/user-guide/understanding-results/index.md new file mode 100644 index 0000000000..2122aa0e2e --- /dev/null +++ b/_benchmark/user-guide/understanding-results/index.md @@ -0,0 +1,12 @@ +--- +layout: default +title: Understanding results +nav_order: 20 +parent: User guide +has_children: true +--- + +After a [running a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/running-workloads/), OpenSearch Benchmark produces a series of metrics. The following pages details: + +- [How metrics are reported]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-results/summary-reports/) +- [How to visualize metrics]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-results/telemetry/) \ No newline at end of file diff --git a/_benchmark/user-guide/understanding-results.md b/_benchmark/user-guide/understanding-results/summary-reports.md similarity index 98% rename from _benchmark/user-guide/understanding-results.md rename to _benchmark/user-guide/understanding-results/summary-reports.md index 5b8935a8c7..28578c8c89 100644 --- a/_benchmark/user-guide/understanding-results.md +++ b/_benchmark/user-guide/understanding-results/summary-reports.md @@ -1,10 +1,14 @@ --- layout: default -title: Understanding benchmark results +title: Summary reports nav_order: 22 -parent: User guide +grand_parent: User guide +parent: Understanding results +redirect_from: + - /benchmark/user-guide/understanding-results/ --- +# Understanding the summary report At the end of each test run, OpenSearch Benchmark creates a summary of test result metrics like service time, throughput, latency, and more. These metrics provide insights into how the selected workload performed on a benchmarked OpenSearch cluster. diff --git a/_benchmark/user-guide/understanding-results/telemetry.md b/_benchmark/user-guide/understanding-results/telemetry.md new file mode 100644 index 0000000000..3548dd4456 --- /dev/null +++ b/_benchmark/user-guide/understanding-results/telemetry.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Enabling telemetry devices +nav_order: 30 +grand_parent: User guide +parent: Understanding results +redirect_from: + - /benchmark/user-guide/telemetry +--- + +# Enabling telemetry devices + +Telemetry results will not appear in the summary report. To visualize telemetry results, ingest the data into OpenSearch and visualize the data in OpenSearch Dashboards. + +To view a list of the available telemetry devices, use the command `opensearch-benchmark list telemetry`. After you've selected a [supported telemetry device]({{site.url}}{{site.baseurl}}/benchmark/reference/telemetry/), you can activate the device when running a tests with the `--telemetry` command flag. For example, if you want to use the `jfr` device with the `geonames` workload, enter the following command: + +```json +opensearch-benchmark workload --workload=geonames --telemetry=jfr +``` +{% include copy-curl.html %} + diff --git a/_benchmark/user-guide/understanding-workloads/anatomy-of-a-workload.md b/_benchmark/user-guide/understanding-workloads/anatomy-of-a-workload.md index 3bf339e4d5..f8e1d90d32 100644 --- a/_benchmark/user-guide/understanding-workloads/anatomy-of-a-workload.md +++ b/_benchmark/user-guide/understanding-workloads/anatomy-of-a-workload.md @@ -98,7 +98,7 @@ To create an index, specify its `name`. To add definitions to your index, use th The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters: - `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.zst`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must include one JSON file containing the name. -- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client is assigned an Nth of the document corpus to ingest into the test cluster. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. +- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client is assigned an Nth of the document corpus to ingest into the test cluster. When using a source that contains a document with a parent/child relationship, specify the number of parent documents. - `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. - `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents. diff --git a/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md b/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md index d7ae48ad0a..ae973a7c62 100644 --- a/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md +++ b/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md @@ -18,12 +18,12 @@ Consider the following criteria when deciding which workload would work best for - The cluster's use case. - The data types that your cluster uses compared to the data structure of the documents contained in the workload. Each workload contains an example document so that you can compare data types, or you can view the index mappings and data types in the `index.json` file. -- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations. +- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations. For a list of common operations, see [Common operations]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/common-operations/). ## General search clusters -For benchmarking clusters built for general search use cases, start with the `[nyc_taxis]`(https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis) workload. This workload contains data about the rides taken in yellow taxis in New York City in 2015. +For benchmarking clusters built for general search use cases, start with the [nyc_taxis](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis) workload. This workload contains data about the rides taken in yellow taxis in New York City in 2015. ## Log data -For benchmarking clusters built for indexing and search with log data, use the [`http_logs`](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/http_logs) workload. This workload contains data about the 1998 World Cup. \ No newline at end of file +For benchmarking clusters built for indexing and search with log data, use the [http_logs](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/http_logs) workload. This workload contains data about the 1998 World Cup. diff --git a/_benchmark/user-guide/understanding-workloads/common-operations.md b/_benchmark/user-guide/understanding-workloads/common-operations.md new file mode 100644 index 0000000000..c9fe15c18c --- /dev/null +++ b/_benchmark/user-guide/understanding-workloads/common-operations.md @@ -0,0 +1,181 @@ +--- +layout: default +title: Common operations +nav_order: 16 +grand_parent: User guide +parent: Understanding workloads +--- + +# Common operations + +[Test procedures]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload#_operations-and-_test-procedures) use a variety of operations, found inside the `operations` directory of a workload. This page details the most common operations found inside OpenSearch Benchmark workloads. + +- [Common operations](#common-operations) + - [bulk](#bulk) + - [create-index](#create-index) + - [delete-index](#delete-index) + - [cluster-health](#cluster-health) + - [refresh](#refresh) + - [search](#search) + + +## bulk + + +The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task. + +The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents: + +```yml +{ + "name": "index-append", + "operation-type": "bulk", + "bulk-size": 5000 +} +``` + + + +## create-index + + +The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation: + +- Creating all indexes specified in the workloads `indices` section +- Creating one specific index defined within the operation itself + +The following example creates all indexes defined in the `indices` section of the workload. It uses all of the index settings defined in the workload but overrides the number of shards: + +```yml +{ + "name": "create-all-indices", + "operation-type": "create-index", + "settings": { + "index.number_of_shards": 1 + }, + "request-params": { + "wait_for_active_shards": "true" + } +} +``` + +The following example creates a new index with all index settings specified in the operation body: + +```yml +{ + "name": "create-an-index", + "operation-type": "create-index", + "index": "people", + "body": { + "settings": { + "index.number_of_shards": 0 + }, + "mappings": { + "docs": { + "properties": { + "name": { + "type": "text" + } + } + } + } + } +} +``` + + + + +## delete-index + + +The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. + +The following example deletes all indexes found in the `indices` section of the workload: + +```yml +{ + "name": "delete-all-indices", + "operation-type": "delete-index" +} +``` + +The following example deletes all `logs_*` indexes: + +```yml +{ + "name": "delete-logs", + "operation-type": "delete-index", + "index": "logs-*", + "only-if-exists": false, + "request-params": { + "expand_wildcards": "all", + "allow_no_indices": "true", + "ignore_unavailable": "true" + } +} +``` + + +## cluster-health + + +The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. + +The following example creates a `cluster-health` operation that checks for a `green` health status on any `log-*` indexes: + +```yml +{ + "name": "check-cluster-green", + "operation-type": "cluster-health", + "index": "logs-*", + "request-params": { + "wait_for_status": "green", + "wait_for_no_relocating_shards": "true" + }, + "retry-until-success": true +} + +``` + + +## refresh + + +The `refresh` operation runs the Refresh API. The `operation` returns no metadata. + + +The following example refreshes all `logs-*` indexes: + +```yml +{ + "name": "refresh", + "operation-type": "refresh", + "index": "logs-*" +} +``` + + + +## search + + +The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. + +The following example runs a `match_all` query inside the `search` operation: + +```yml +{ + "name": "default", + "operation-type": "search", + "body": { + "query": { + "match_all": {} + } + }, + "request-params": { + "_source_include": "some_field", + "analyze_wildcard": "false" + } +} +``` diff --git a/_benchmark/user-guide/understanding-workloads/index.md b/_benchmark/user-guide/understanding-workloads/index.md index 844b565185..6e6d2aa9c1 100644 --- a/_benchmark/user-guide/understanding-workloads/index.md +++ b/_benchmark/user-guide/understanding-workloads/index.md @@ -1,7 +1,7 @@ --- layout: default title: Understanding workloads -nav_order: 7 +nav_order: 10 parent: User guide has_children: true --- diff --git a/_benchmark/user-guide/contributing-workloads.md b/_benchmark/user-guide/working-with-workloads/contributing-workloads.md similarity index 97% rename from _benchmark/user-guide/contributing-workloads.md rename to _benchmark/user-guide/working-with-workloads/contributing-workloads.md index e60f60eaed..74524f36cb 100644 --- a/_benchmark/user-guide/contributing-workloads.md +++ b/_benchmark/user-guide/working-with-workloads/contributing-workloads.md @@ -2,7 +2,10 @@ layout: default title: Sharing custom workloads nav_order: 11 -parent: User guide +grand_parent: User guide +parent: Working with workloads +redirect_from: + - /benchmark/user-guide/contributing-workloads/ --- # Sharing custom workloads diff --git a/_benchmark/user-guide/creating-custom-workloads.md b/_benchmark/user-guide/working-with-workloads/creating-custom-workloads.md similarity index 99% rename from _benchmark/user-guide/creating-custom-workloads.md rename to _benchmark/user-guide/working-with-workloads/creating-custom-workloads.md index ee0dca1ce9..a239c94249 100644 --- a/_benchmark/user-guide/creating-custom-workloads.md +++ b/_benchmark/user-guide/working-with-workloads/creating-custom-workloads.md @@ -2,7 +2,8 @@ layout: default title: Creating custom workloads nav_order: 10 -parent: User guide +grand_parent: User guide +parent: Working with workloads redirect_from: - /benchmark/user-guide/creating-custom-workloads/ - /benchmark/creating-custom-workloads/ @@ -263,7 +264,7 @@ opensearch-benchmark list workloads --workload-path= Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster. ``` -opensearch-benchmark execute_test \ +opensearch-benchmark execute-test \ --pipeline="benchmark-only" \ --workload-path="" \ --target-host="" \ @@ -289,7 +290,7 @@ head -n 1000 -documents.json > -documents-1k.json Then, run `opensearch-benchmark execute-test` with the option `--test-mode`. Test mode runs a quick version of the workload test. ``` -opensearch-benchmark execute_test \ +opensearch-benchmark execute-test \ --pipeline="benchmark-only" \ --workload-path="" \ --target-host="" \ diff --git a/_benchmark/user-guide/finetine-workloads.md b/_benchmark/user-guide/working-with-workloads/finetune-workloads.md similarity index 97% rename from _benchmark/user-guide/finetine-workloads.md rename to _benchmark/user-guide/working-with-workloads/finetune-workloads.md index 4fc0a284db..d150247ad8 100644 --- a/_benchmark/user-guide/finetine-workloads.md +++ b/_benchmark/user-guide/working-with-workloads/finetune-workloads.md @@ -2,7 +2,10 @@ layout: default title: Fine-tuning custom workloads nav_order: 12 -parent: User guide +grand_parent: User guide +parent: Working with workloads +redirect_from: + - /benchmark/user-guide/finetine-workloads/ --- # Fine-tuning custom workloads diff --git a/_benchmark/user-guide/working-with-workloads/index.md b/_benchmark/user-guide/working-with-workloads/index.md new file mode 100644 index 0000000000..a6acb86b4b --- /dev/null +++ b/_benchmark/user-guide/working-with-workloads/index.md @@ -0,0 +1,16 @@ +--- +layout: default +title: Working with workloads +nav_order: 15 +parent: User guide +has_children: true +--- + +# Working with workloads + +Once you [understand workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/index/) and have [chosen a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/choosing-a-workload/) to run your benchmarks with, you can begin working with workloads. + +- [Running workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/running-workloads/): Learn how to run an OpenSearch Benchmark workload. +- [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/creating-custom-workloads/): Create a custom workload with your own datasets. +- [Fine-tuning workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/finetune-workloads/): Fine-tune your custom workload according to the needs of your cluster. +- [Contributing workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/contributing-workloads/): Contribute your custom workload for the OpenSearch community to use. \ No newline at end of file diff --git a/_benchmark/user-guide/running-workloads.md b/_benchmark/user-guide/working-with-workloads/running-workloads.md similarity index 99% rename from _benchmark/user-guide/running-workloads.md rename to _benchmark/user-guide/working-with-workloads/running-workloads.md index 36108eb9c8..534d61f6b9 100644 --- a/_benchmark/user-guide/running-workloads.md +++ b/_benchmark/user-guide/working-with-workloads/running-workloads.md @@ -2,7 +2,10 @@ layout: default title: Running a workload nav_order: 9 -parent: User guide +grand_parent: User guide +parent: Working with workloads +redirect_from: + - /benchmark/user-guide/running-workloads/ --- # Running a workload diff --git a/_clients/index.md b/_clients/index.md index fc8c23d912..a15f0539d2 100644 --- a/_clients/index.md +++ b/_clients/index.md @@ -53,8 +53,8 @@ To view the compatibility matrix for a specific client, see the `COMPATIBILITY.m Client | Recommended version :--- | :--- -[Elasticsearch Java low-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-client/7.13.4/jar) | 7.13.4 -[Elasticsearch Java high-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.4/jar) | 7.13.4 +[Elasticsearch Java low-level REST client](https://central.sonatype.com/artifact/org.elasticsearch.client/elasticsearch-rest-client/7.13.4) | 7.13.4 +[Elasticsearch Java high-level REST client](https://central.sonatype.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.4) | 7.13.4 [Elasticsearch Python client](https://pypi.org/project/elasticsearch/7.13.4/) | 7.13.4 [Elasticsearch Node.js client](https://www.npmjs.com/package/@elastic/elasticsearch/v/7.13.0) | 7.13.0 [Elasticsearch Ruby client](https://rubygems.org/gems/elasticsearch/versions/7.13.0) | 7.13.0 diff --git a/_config.yml b/_config.yml index 8a43e2f61a..4ead6344c2 100644 --- a/_config.yml +++ b/_config.yml @@ -5,9 +5,9 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com permalink: /:path/ -opensearch_version: '2.16.0' -opensearch_dashboards_version: '2.16.0' -opensearch_major_minor_version: '2.16' +opensearch_version: '2.17.0' +opensearch_dashboards_version: '2.17.0' +opensearch_major_minor_version: '2.17' lucene_version: '9_11_1' # Build settings diff --git a/_dashboards/csp/csp-dynamic-configuration.md b/_dashboards/csp/csp-dynamic-configuration.md index abe80a60c7..794323472b 100644 --- a/_dashboards/csp/csp-dynamic-configuration.md +++ b/_dashboards/csp/csp-dynamic-configuration.md @@ -1,11 +1,11 @@ --- layout: default -title: Configuring CSP rules for `frame-ancestors` +title: Configuring CSP rules for frame ancestors nav_order: 140 has_children: false --- -# Configuring CSP rules for `frame-ancestors` +# Configuring CSP rules for frame ancestors Introduced 2.13 {: .label .label-purple } diff --git a/_dashboards/management/S3-data-source.md b/_dashboards/management/S3-data-source.md index 585edeac81..1a7cc579b0 100644 --- a/_dashboards/management/S3-data-source.md +++ b/_dashboards/management/S3-data-source.md @@ -10,51 +10,41 @@ has_children: true Introduced 2.11 {: .label .label-purple } -Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards UI. You can then query that data, optimize query performance, define tables, and integrate your S3 data within a single UI. +You can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards interface and then query that data, optimize query performance, define tables, and integrate your S3 data. ## Prerequisites -To connect data from Amazon S3 to OpenSearch using OpenSearch Dashboards, you must have: +Before connecting a data source, verify that the following requirements are met: -- Access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2). -- Access to OpenSearch and OpenSearch Dashboards. -- An understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for information about these concepts. +- You have access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2). +- You have access to OpenSearch and OpenSearch Dashboards. +- You have an understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for more information. -## Connect your Amazon S3 data source +## Connect your data source -To connect your Amazon S3 data source, follow these steps: +To connect your data source, follow these steps: -1. From the OpenSearch Dashboards main menu, select **Management** > **Data sources**. -2. On the **Data sources** page, select **New data source** > **S3**. An example UI is shown in the following image. +1. From the OpenSearch Dashboards main menu, go to **Management** > **Dashboards Management** > **Data sources**. +2. On the **Data sources** page, select **Create data source connection** > **Amazon S3**. +3. On the **Configure Amazon S3 data source** page, enter the data source, authentication details, and permissions. +4. Select the **Review Configuration** button to verify the connection details. +5. Select the **Connect to Amazon S3** button to establish a connection. - Amazon S3 data sources UI +## Manage your data source -3. On the **Configure Amazon S3 data source** page, enter the required **Data source details**, **AWS Glue authentication details**, **AWS Glue index store details**, and **Query permissions**. An example UI is shown in the following image. - - Amazon S3 configuration UI - -4. Select the **Review Configuration** button and verify the details. -5. Select the **Connect to Amazon S3** button. - -## Manage your Amazon S3 data source - -Once you've connected your Amazon S3 data source, you can explore your data through the **Manage data sources** tab. The following steps guide you through using this functionality: +To manage your data source, follow these steps: 1. On the **Manage data sources** tab, choose a date source from the list. -2. On that data source's page, you can manage the data source, choose a use case, and manage access controls and configurations. An example UI is shown in the following image. - - Manage data sources UI - -3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Go to **Next steps** to learn more about each use case. +2. On the page for the data source, you can manage the data source, choose a use case, and configure access controls. +3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Refer to the [**Next steps**](#next-steps) section to learn more about each use case. ## Limitations -This feature is still under development, including the data integration functionality. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). +This feature is currently under development, including the data integration functionality. For up-to-date information, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). ## Next steps - Learn about [querying your data in Data Explorer]({{site.url}}{{site.baseurl}}/dashboards/management/query-data-source/) through OpenSearch Dashboards. -- Learn about ways to [optimize the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench. +- Learn about [optimizing the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench. - Learn about [Amazon S3 and AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst) and the APIS used with Amazon S3 data sources, including configuration settings and query examples. - Learn about [managing your indexes]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards. - diff --git a/_dashboards/management/accelerate-external-data.md b/_dashboards/management/accelerate-external-data.md index 00e4600ffd..6d1fa030e4 100644 --- a/_dashboards/management/accelerate-external-data.md +++ b/_dashboards/management/accelerate-external-data.md @@ -12,55 +12,37 @@ Introduced 2.11 {: .label .label-purple } -Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. +Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. + +A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. + +A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. ## Data sources use case: Accelerate performance To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps: 1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner. -2. From the left-side navigation menu, select a database. An example using the `http_logs` database is shown in the following image. - - Query Workbench accelerate data UI - +2. From the left-side navigation menu, select a database. 3. View the results in the table and confirm that you have the desired data. 4. Create an OpenSearch index by following these steps: - 1. Select the **Accelerate data** button. A pop-up window appears. An example is shown in the following image. - - Accelerate data pop-up window - + 1. Select the **Accelerate data** button. A pop-up window appears. 2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. - -5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. An example is shown in the following image. - - Skipping index settings +5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. ### Define skipping index settings -1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. An example is shown in the following image. - - Skipping index add fields - +1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. 2. Select the **Copy Query to Editor** button to apply your skipping index settings. -3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example is shown in the following image. - - Run a skippping or covering index UI +3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. ### Define covering index settings -1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. An example is shown in the following image. - - Covering index settings - -2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. An example is shown in the following image. - - Covering index field naming - +1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. +2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. 3. Select the **Copy Query to Editor** button to apply your covering index settings. -4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example UI is shown in the following image. - - Run index in Query Workbench +4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. ## Limitations -This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). +This feature is still under development, so there are some limitations. For real-time updates, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). diff --git a/_dashboards/management/connect-prometheus.md b/_dashboards/management/connect-prometheus.md new file mode 100644 index 0000000000..ed5545bc56 --- /dev/null +++ b/_dashboards/management/connect-prometheus.md @@ -0,0 +1,53 @@ +--- +layout: default +title: Connecting Prometheus to OpenSearch +parent: Data sources +nav_order: 20 +--- + +# Connecting Prometheus to OpenSearch +Introduced 2.16 +{: .label .label-purple } + +This documentation covers the key steps to connect Prometheus to OpenSearch using the OpenSearch Dashboards interface, including setting up the data source connection, modifying the connection details, and creating an index pattern for the Prometheus data. + +## Prerequisites and permissions + +Before connecting a data source, ensure you have met the [Prerequisites]({{site.url}}{{site.baseurl}}/dashboards/management/data-sources/#prerequisites) and have the necessary [Permissions]({{site.url}}{{site.baseurl}}/dashboards/management/data-sources/#permissions). + +## Create a Prometheus data source connection + +A data source connection specifies the parameters needed to connect to a data source. These parameters form a connection string for the data source. Using OpenSearch Dashboards, you can add new **Prometheus** data source connections or manage existing ones. + +Follow these steps to connect your data source: + +1. From the OpenSearch Dashboards main menu, go to **Management** > **Data sources** > **New data source** > **Prometheus**. + +2. From the **Configure Prometheus data source** section: + + - Under **Data source details**, provide a title and optional description. + - Under **Prometheus data location**, enter the Prometheus URI. + - Under **Authentication details**, select the appropriate authentication method from the dropdown list and enter the required details: + - **Basic authentication**: Enter a username and password. + - **AWS Signature Version 4**: Specify the **Region**, select the OpenSearch service from the **Service Name** list (**Amazon OpenSearch Service** or **Amazon OpenSearch Serverless**), and enter the **Access Key** and **Secret Key**. + - Under **Query permissions**, choose the role needed to search and index data. If you select **Restricted**, an additional field will become available to configure the required role. + +3. Select **Review Configuration** > **Connect to Prometheus** to save your settings. The new connection will appear in the list of data sources. + +## Modify a data source connection + +To modify a data source connection, follow these steps: + +1. Select the desired connection from the list on the **Data sources** main page. This will open the **Connection Details** window. +2. Within the **Connection Details** window, edit the **Title** and **Description** fields. Select the **Save changes** button to apply the changes. +3. To update the **Authentication Method**, choose the method from the dropdown list and enter any necessary credentials. Select **Save changes** to apply the changes. + - To update the **Basic authentication** authentication method, select the **Update stored password** button. Within the pop-up window, enter the updated password and confirm it and select **Update stored password** to save the changes. To test the connection, select the **Test connection** button. + - To update the **AWS Signature Version 4** authentication method, select the **Update stored AWS credential** button. Within the pop-up window, enter the updated access and secret keys and select **Update stored AWS credential** to save the changes. To test the connection, select the **Test connection** button. + +## Delete a data source connection + +To delete the data source connection, select the {::nomarkdown}delete icon{:/} icon. + +## Create an index pattern + +After creating a data source connection, the next step is to create an index pattern for that data source. For more information and a tutorial on index patterns, refer to [Index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/). diff --git a/_dashboards/management/data-sources.md b/_dashboards/management/data-sources.md index fdd4edc150..62d3a5aab2 100644 --- a/_dashboards/management/data-sources.md +++ b/_dashboards/management/data-sources.md @@ -13,67 +13,37 @@ This documentation focuses on using the OpenSeach Dashboards interface to connec ## Prerequisites -The first step in connecting your data sources to OpenSearch is to install OpenSearch and OpenSearch Dashboards on your system. You can follow the installation instructions in the [OpenSearch documentation]({{site.url}}{{site.baseurl}}/install-and-configure/index/) to install these tools. +The first step in connecting your data sources to OpenSearch is to install OpenSearch and OpenSearch Dashboards on your system. Refer to the [installation instructions]({{site.url}}{{site.baseurl}}/install-and-configure/index/) for information. Once you have installed OpenSearch and OpenSearch Dashboards, you can use Dashboards to connect your data sources to OpenSearch and then use Dashboards to manage data sources, create index patterns based on those data sources, run queries against a specific data source, and combine visualizations in one dashboard. Configuration of the [YAML files]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/#configuration-file) and installation of the `dashboards-observability` and `opensearch-sql` plugins is necessary. For more information, see [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/). -## Permissions - -To work with data sources in OpenSearch Dashboards, make sure that the user has been assigned the correct cluster-level [data source permission]({{site.url}}{{site.baseurl}}/security/access-control/permissions#data-source-permissions). - - - -## Create a data source connection - -A data source connection specifies the parameters needed to connect to a data source. These parameters form a connection string for the data source. Using Dashboards, you can add new data source connections or manage existing ones. - -The following steps guide you through the basics of creating a data source connection: - -1. From the OpenSearch Dashboards main menu, select **Management** > **Data sources** > **Create data source connection**. The UI is shown in the following image. +To securely store and encrypt data source connections in OpenSearch, you must add the following configuration to the `opensearch.yml` file on all the nodes: - Connecting a data source UI +`plugins.query.datasources.encryption.masterkey: "YOUR_GENERATED_MASTER_KEY_HERE"` -2. Create the data source connection by entering the appropriate information into the **Connection Details** and **Authentication Method** fields. - - - Under **Connection Details**, enter a title and endpoint URL. For this tutorial, use the URL `http://localhost:5601/app/management/opensearch-dashboards/dataSources`. Entering a description is optional. +The key must be 16, 24, or 32 characters. You can use the following command to generate a 24-character key: - - Under **Authentication Method**, select an authentication method from the dropdown list. Once an authentication method is selected, the applicable fields for that method appear. You can then enter the required details. The authentication method options are: - - **No authentication**: No authentication is used to connect to the data source. - - **Username & Password**: A basic username and password are used to connect to the data source. - - **AWS SigV4**: An AWS Signature Version 4 authenticating request is used to connect to the data source. AWS Signature Version 4 requires an access key and a secret key. - - For AWS Signature Version 4 authentication, first specify the **Region**. Next, select the OpenSearch service in the **Service Name** list. The options are **Amazon OpenSearch Service** and **Amazon OpenSearch Serverless**. Lastly, enter the **Access Key** and **Secret Key** for authorization. +`openssl rand -hex 12` - After you have populated the required fields, the **Test connection** and **Create data source** buttons become active. You can select **Test connection** to confirm that the connection is valid. +Generating 12 bytes results in a hexadecimal string that is 12 * 2 = 24 characters. +{: .note} -3. Select **Create data source** to save your settings. The connection is created. The active window returns to the **Data sources** main page, and the new connection appears in the list of data sources. - -4. To delete a data source connection, select the checkbox to the left of the data source **Title** and then select the **Delete 1 connection** button. Selecting multiple checkboxes for multiple connections is supported. An example UI is shown in the following image. - - Deleting a data source UI - -### Modify a data source connection - -To make changes to a data source connection, select a connection in the list on the **Data sources** main page. The **Connection Details** window opens. - -To make changes to **Connection Details**, edit one or both of the **Title** and **Description** fields and select **Save changes** in the lower-right corner of the screen. You can also cancel changes here. To change the **Authentication Method**, choose a different authentication method, enter your credentials (if applicable), and then select **Save changes** in the lower-right corner of the screen. The changes are saved. - -When **Username & Password** is the selected authentication method, you can update the password by choosing **Update stored password** next to the **Password** field. In the pop-up window, enter a new password in the first field and then enter it again in the second field to confirm. Select **Update stored password** in the pop-up window. The new password is saved. Select **Test connection** to confirm that the connection is valid. +## Permissions -When **AWS SigV4** is the selected authentication method, you can update the credentials by selecting **Update stored AWS credential**. In the pop-up window, enter a new access key in the first field and a new secret key in the second field. Select **Update stored AWS credential** in the pop-up window. The new credentials are saved. Select **Test connection** in the upper-right corner of the screen to confirm that the connection is valid. +To work with data sources in OpenSearch Dashboards, you must be assigned the correct cluster-level [data source permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions#data-source-permissions). -To delete the data source connection, select the delete icon ({::nomarkdown}delete icon{:/}). +## Types of data streams -## Create an index pattern +To configure data sources through OpenSearch Dashboards, go to **Management** > **Dashboards Management** > **Data sources**. This flow can be used for OpenSearch data stream connections. See [Configuring and using multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/). -Once you've created a data source connection, you can create an index pattern for the data source. An _index pattern_ is a template that OpenSearch uses to create indexes for data from the data source. See [Index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) for more information and a tutorial. +Alternatively, if you are running OpenSearch Dashboards 2.16 or later, go to **Management** > **Data sources**. This flow can be used to connect Amazon Simple Storage Service (Amazon S3) and Prometheus. See [Connecting Amazon S3 to OpenSearch]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/) and [Connecting Prometheus to OpenSearch]({{site.url}}{{site.baseurl}}/dashboards/management/connect-prometheus/) for more information. ## Next steps - Learn about [managing index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) through OpenSearch Dashboards. - Learn about [indexing data using Index Management]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards. - Learn about how to connect [multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/). -- Learn about how to [connect OpenSearch and Amazon S3 through OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/). -- Learn about the [Integrations]({{site.url}}{{site.baseurl}}/integrations/index/) tool, which gives you the flexibility to use various data ingestion methods and connect data from the Dashboards UI. - +- Learn about how to connect [OpenSearch and Amazon S3]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/) and [OpenSearch and Prometheus]({{site.url}}{{site.baseurl}}/dashboards/management/connect-prometheus/) using the OpenSearch Dashboards interface. +- Learn about the [Integrations]({{site.url}}{{site.baseurl}}/integrations/index/) plugin, which gives you the flexibility to use various data ingestion methods and connect data to OpenSearch Dashboards. diff --git a/_dashboards/management/query-data-source.md b/_dashboards/management/query-data-source.md index f1496b3e17..a3392c073e 100644 --- a/_dashboards/management/query-data-source.md +++ b/_dashboards/management/query-data-source.md @@ -11,7 +11,7 @@ has_children: false Introduced 2.11 {: .label .label-purple } -This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards. +This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards. ## Prerequisites @@ -22,15 +22,9 @@ You must be using the `opensearch-security` plugin and have the appropriate role To get started, follow these steps: 1. On the **Manage data sources** page, select your data source from the list. -2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page, which is shown in the following image. - - Observability Logs UI - +2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page. 3. Select the **Event Explorer** button. This option creates and saves frequently searched queries and visualizations using [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) or [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/), which connects to Spark SQL. -4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. An example is shown in the following image. - - Observability Logs Amazon S3 dropdown menu - +4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. 5. Enter the query in the **Enter PPL query** field. Note that the default language is SQL. To change the language, select PPL from the dropdown menu. 6. Select the **Search** button. The **Query Processing** message is shown, confirming that your query is being processed. 7. View the results, which are listed in a table on the **Events** tab. On this page, details such as available fields, source, and time are shown in a table format. @@ -40,10 +34,7 @@ To get started, follow these steps: To create visualizations, follow these steps: -1. On the **Explorer** page, select the **Visualizations** tab. An example is shown in the following image. - - Explorer Amazon S3 visualizations UI - +1. On the **Explorer** page, select the **Visualizations** tab. 2. Select **Index data to visualize**. This option currently only creates [acceleration indexes]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), which give you views of the data visualizations from the **Visualizations** tab. To create a visualization of your Amazon S3 data, go to **Discover**. See the [Discover documentation]({{site.url}}{{site.baseurl}}/dashboards/discover/index-discover/) for information and a tutorial. ## Use Query Workbench with your Amazon S3 data source @@ -53,14 +44,12 @@ To create visualizations, follow these steps: To use Query Workbench with your Amazon S3 data, follow these steps: 1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Query Workbench**. -2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. An example is shown in the following image. - - Query Workbench Amazon S3 data loading UI - +2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. 3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**. 4. Choose the **Describe Index** button to learn more about how data is stored in that particular index. 5. Choose the **Drop index** button to delete and clear both the OpenSearch index and the Amazon S3 Spark job that refreshes the data. -6. Enter your SQL query and select **Run**. +6. Enter your SQL query and select **Run**. + ## Next steps - Learn about [accelerating the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/). diff --git a/_dashboards/query-workbench.md b/_dashboards/query-workbench.md index 8fe41afcdf..700d6a7340 100644 --- a/_dashboards/query-workbench.md +++ b/_dashboards/query-workbench.md @@ -8,19 +8,14 @@ redirect_from: # Query Workbench -Query Workbench is a tool within OpenSearch Dashboards. You can use Query Workbench to run on-demand [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) queries, translate queries into their equivalent REST API calls, and view and save results in different [response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/). +You can use Query Workbench in OpenSearch Dashboards to run on-demand [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) queries, translate queries into their equivalent REST API calls, and view and save results in different [response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/). -A view of the Query Workbench interface within OpenSearch Dashboards is shown in the following image. - -Query Workbench interface within OpenSearch Dashboards - -## Prerequisites - -Before getting started, make sure you have [indexed your data]({{site.url}}{{site.baseurl}}/im-plugin/index/). +Query Workbench does not support delete or update operations through SQL or PPL. Access to data is read-only. +{: .important} -For this tutorial, you can index the following sample documents. Alternatively, you can use the [OpenSearch Playground](https://playground.opensearch.org/app/opensearch-query-workbench#/), which has preloaded indexes that you can use to try out Query Workbench. +## Prerequisites -To index sample documents, send the following [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) request: +Before getting started with this tutorial, index the sample documents by sending the following [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) request: ```json PUT accounts/_bulk?refresh @@ -35,9 +30,11 @@ PUT accounts/_bulk?refresh ``` {% include copy-curl.html %} -## Running SQL queries within Query Workbench +See [Managing indexes]({{site.url}}{{site.baseurl}}/im-plugin/index/) to learn about indexing your own data. -Follow these steps to learn how to run SQL queries against your OpenSearch data using Query Workbench: +## Running SQL queries within Query Workbench + + The following steps guide you through running SQL queries against OpenSearch data: 1. Access Query Workbench. - To access Query Workbench, go to OpenSearch Dashboards and choose **OpenSearch Plugins** > **Query Workbench** from the main menu. @@ -64,23 +61,15 @@ Follow these steps to learn how to run SQL queries against your OpenSearch data 3. View the results. - View the results in the **Results** pane, which presents the query output in tabular format. You can filter and download the results as needed. - The following image shows the query editor pane and results pane for the preceding SQL query: - - Query Workbench SQL query input and results output panes - 4. Clear the query editor. - Select the **Clear** button to clear the query editor and run a new query. 5. Examine how the query is processed. - - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. - - The following image shows the explanation of the SQL query that was run in step 2. - - Query Workbench SQL query explanation pane + - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. ## Running PPL queries within Query Workbench -Follow these steps to learn how to run PPL queries against your OpenSearch data using Query Workbench: +Follow these steps to learn how to run PPL queries against OpenSearch data: 1. Access Query Workbench. - To access Query Workbench, go to OpenSearch Dashboards and choose **OpenSearch Plugins** > **Query Workbench** from the main menu. @@ -100,19 +89,8 @@ Follow these steps to learn how to run PPL queries against your OpenSearch data 3. View the results. - View the results in the **Results** pane, which presents the query output in tabular format. - The following image shows the query editor pane and results pane for the PPL query that was run in step 2: - - Query Workbench PPL query input and results output panes - 4. Clear the query editor. - Select the **Clear** button to clear the query editor and run a new query. 5. Examine how the query is processed. - - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. - - The following image shows the explanation of the PPL query that was run in step 2. - - Query Workbench PPL query explanation pane - -Query Workbench does not support delete or update operations through SQL or PPL. Access to data is read-only. -{: .important} \ No newline at end of file + - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. diff --git a/_dashboards/visualize/maps-stats-api.md b/_dashboards/visualize/maps-stats-api.md index f81c7e6ac4..7c974a33d9 100644 --- a/_dashboards/visualize/maps-stats-api.md +++ b/_dashboards/visualize/maps-stats-api.md @@ -93,7 +93,7 @@ The following is the response for the preceding request: } ``` -## Response fields +## Response body fields The response contains statistics for the following layer types: diff --git a/_dashboards/visualize/viz-index.md b/_dashboards/visualize/viz-index.md index 75407a6ba5..4bde79d2cc 100644 --- a/_dashboards/visualize/viz-index.md +++ b/_dashboards/visualize/viz-index.md @@ -81,7 +81,7 @@ Region maps show patterns and trends across geographic locations. A region map i ### Markdown -Markdown is a the markup language used in Dashboards to provide context to your data visualizations. Using Markdown, you can display information and instructions along with the visualization. +Markdown is the markup language used in Dashboards to provide context to your data visualizations. Using Markdown, you can display information and instructions along with the visualization. Example coordinate map in OpenSearch Dashboards diff --git a/_data-prepper/common-use-cases/log-analytics.md b/_data-prepper/common-use-cases/log-analytics.md index 30a021b101..ceb26ff5b7 100644 --- a/_data-prepper/common-use-cases/log-analytics.md +++ b/_data-prepper/common-use-cases/log-analytics.md @@ -147,6 +147,6 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica See the [Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> Data Prepper -> OpenSearch` running through Docker. -In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/opensearch-project/data-prepper/projects/1) to see what is coming. +In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming. If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md). diff --git a/_data-prepper/pipelines/configuration/processors/copy-values.md b/_data-prepper/pipelines/configuration/processors/copy-values.md index f654e6f027..20d6a4a2c7 100644 --- a/_data-prepper/pipelines/configuration/processors/copy-values.md +++ b/_data-prepper/pipelines/configuration/processors/copy-values.md @@ -14,44 +14,126 @@ The `copy_values` processor copies values within an event and is a [mutate event You can configure the `copy_values` processor with the following options. -| Option | Required | Description | -:--- | :--- | :--- -| `entries` | Yes | A list of entries to be copied in an event. | -| `from_key` | Yes | The key of the entry to be copied. | -| `to_key` | Yes | The key of the new entry to be added. | -| `overwrite_if_to_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. | +| Option | Required | Type | Description | +:--- | :--- | :--- | :--- +| `entries` | Yes | [entry](#entry) | A list of entries to be copied in an event. See [entry](#entry) for more information. | +| `from_list` | No | String | The key for the list of objects to be copied. | +| `to_list` | No | String | The key for the new list to be added. | +| `overwrite_if_to_list_exists` | No | Boolean | When set to `true`, the existing value is overwritten if the `key` specified by `to_list` already exists in the event. Default is `false`. | + +## entry + +For each entry, you can configure the following options. + +| Option | Required | Type | Description | +:--- | :--- | :--- | :--- +| `from_key` | Yes | String | The key for the entry to be copied. | +| `to_key` | Yes | String | The key for the new entry to be added. | +| `overwrite_if_to_key_exists` | No | Boolean | When set to `true`, the existing value is overwritten if the `key` already exists in the event. Default is `false`. | + ## Usage -To get started, create the following `pipeline.yaml` file: +The following examples show you how to use the `copy_values` processor. + +### Example: Copy values and skip existing fields + +The following example shows you how to configure the processor to copy values and skip existing fields: + +```yaml +... + processor: + - copy_values: + entries: + - from_key: "message1" + to_key: "message2" + - from_key: "message1" + to_key: "message3" +... +``` +{% include copy.html %} + +When the input event contains the following data: + +```json +{"message1": "hello", "message2": "bye"} +``` + +The processor copies "message1" to "message3" but not to "message2" because "message2" already exists. The processed event contains the following data: + +```json +{"message1": "hello", "message2": "bye", "message3": "hello"} +``` + +### Example: Copy values with overwrites + +The following example shows you how to configure the processor to copy values: ```yaml -pipeline: - source: - ... - .... +... processor: - copy_values: entries: - - from_key: "message" - to_key: "newMessage" - overwrite_if_to_key_exists: true - sink: + - from_key: "message1" + to_key: "message2" + overwrite_if_to_key_exists: true + - from_key: "message1" + to_key: "message3" +... ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +When the input event contains the following data: + +```json +{"message1": "hello", "message2": "bye"} +``` -For example, before you run the `copy_values` processor, if the `logs_json.log` file contains the following event record: +The processor copies "message1" to both "message2" and "message3", overwriting the existing value in "message2". The processed event contains the following data: ```json -{"message": "hello"} +{"message1": "hello", "message2": "hello", "message3": "hello"} ``` -When you run this processor, it parses the message into the following output: +### Example: Selectively copy values between two lists of objects + +The following example shows you how to configure the processor to copy values between lists: + +```yaml +... + processor: + - copy_values: + from_list: mylist + to_list: newlist + entries: + - from_key: name + to_key: fruit_name +... +``` +{% include copy.html %} + +When the input event contains the following data: ```json -{"message": "hello", "newMessage": "hello"} +{ + "mylist": [ + {"name": "apple", "color": "red"}, + {"name": "orange", "color": "orange"} + ] +} ``` -If `newMessage` already exists, its existing value is overwritten with `value`. +The processed event contains a `newlist` with selectively copied fields: + +```json +{ + "newlist": [ + {"fruit_name": "apple"}, + {"fruit_name": "orange"} + ], + "mylist": [ + {"name": "apple", "color": "red"}, + {"name": "orange", "color": "orange"} + ] +} +``` diff --git a/_data-prepper/pipelines/configuration/processors/obfuscate.md b/_data-prepper/pipelines/configuration/processors/obfuscate.md index 8d6bf901da..96b03e7405 100644 --- a/_data-prepper/pipelines/configuration/processors/obfuscate.md +++ b/_data-prepper/pipelines/configuration/processors/obfuscate.md @@ -62,6 +62,13 @@ When run, the `obfuscate` processor parses the fields into the following output: Use the following configuration options with the `obfuscate` processor. + + | Parameter | Required | Description | | :--- | :--- | :--- | | `source` | Yes | The source field to obfuscate. | diff --git a/_data-prepper/pipelines/configuration/sinks/s3.md b/_data-prepper/pipelines/configuration/sinks/s3.md index d1413f6ffc..6bae749d38 100644 --- a/_data-prepper/pipelines/configuration/sinks/s3.md +++ b/_data-prepper/pipelines/configuration/sinks/s3.md @@ -173,14 +173,14 @@ When you provide your own Avro schema, that schema defines the final structure o In cases where your data is uniform, you may be able to automatically generate a schema. Automatically generated schemas are based on the first event that the codec receives. The schema will only contain keys from this event, and all keys must be present in all events in order to automatically generate a working schema. Automatically generated schemas make all fields nullable. Use the `include_keys` and `exclude_keys` sink configurations to control which data is included in the automatically generated schema. -Avro fields should use a null [union](https://avro.apache.org/docs/current/specification/#unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. +Avro fields should use a null [union](https://avro.apache.org/docs/1.12.0/specification/#unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. +`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration). Not required if `auto_schema` is set to true. +`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration) from the first event. ### `ndjson` codec @@ -208,8 +208,8 @@ Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. +`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration). Not required if `auto_schema` is set to true. +`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration) from the first event. ### Setting a schema with Parquet diff --git a/_data-prepper/pipelines/configuration/sources/documentdb.md b/_data-prepper/pipelines/configuration/sources/documentdb.md index c453b60a39..d3dd31edcb 100644 --- a/_data-prepper/pipelines/configuration/sources/documentdb.md +++ b/_data-prepper/pipelines/configuration/sources/documentdb.md @@ -3,7 +3,7 @@ layout: default title: documentdb parent: Sources grand_parent: Pipelines -nav_order: 2 +nav_order: 10 --- # documentdb diff --git a/_data-prepper/pipelines/configuration/sources/dynamo-db.md b/_data-prepper/pipelines/configuration/sources/dynamo-db.md index e465f45044..c5a7c8d188 100644 --- a/_data-prepper/pipelines/configuration/sources/dynamo-db.md +++ b/_data-prepper/pipelines/configuration/sources/dynamo-db.md @@ -3,7 +3,7 @@ layout: default title: dynamodb parent: Sources grand_parent: Pipelines -nav_order: 3 +nav_order: 20 --- # dynamodb diff --git a/_data-prepper/pipelines/configuration/sources/http.md b/_data-prepper/pipelines/configuration/sources/http.md index 06933edc1c..574f49e289 100644 --- a/_data-prepper/pipelines/configuration/sources/http.md +++ b/_data-prepper/pipelines/configuration/sources/http.md @@ -3,14 +3,14 @@ layout: default title: http parent: Sources grand_parent: Pipelines -nav_order: 5 +nav_order: 30 redirect_from: - /data-prepper/pipelines/configuration/sources/http-source/ --- # http -The `http` plugin accepts HTTP requests from clients. Currently, `http` only supports the JSON UTF-8 codec for incoming requests, such as `[{"key1": "value1"}, {"key2": "value2"}]`. The following table describes options you can use to configure the `http` source. +The `http` plugin accepts HTTP requests from clients. The following table describes options you can use to configure the `http` source. Option | Required | Type | Description :--- | :--- | :--- | :--- @@ -36,6 +36,19 @@ aws_region | Conditionally | String | AWS region used by ACM or Amazon S3. Requi Content will be added to this section.---> +## Ingestion + +Clients should send HTTP `POST` requests to the endpoint `/log/ingest`. + +The `http` protocol only supports the JSON UTF-8 codec for incoming requests, for example, `[{"key1": "value1"}, {"key2": "value2"}]`. + +#### Example: Ingest data with cURL + +The following cURL command can be used to ingest data: + +`curl "http://localhost:2021/log/ingest" --data '[{"key1": "value1"}, {"key2": "value2"}]'` +{% include copy-curl.html %} + ## Metrics The `http` source includes the following metrics. diff --git a/_data-prepper/pipelines/configuration/sources/kafka.md b/_data-prepper/pipelines/configuration/sources/kafka.md index e8452a93c3..ecd7c7eaa0 100644 --- a/_data-prepper/pipelines/configuration/sources/kafka.md +++ b/_data-prepper/pipelines/configuration/sources/kafka.md @@ -3,7 +3,7 @@ layout: default title: kafka parent: Sources grand_parent: Pipelines -nav_order: 6 +nav_order: 40 --- # kafka diff --git a/_data-prepper/pipelines/configuration/sources/kinesis.md b/_data-prepper/pipelines/configuration/sources/kinesis.md new file mode 100644 index 0000000000..659ed2afe1 --- /dev/null +++ b/_data-prepper/pipelines/configuration/sources/kinesis.md @@ -0,0 +1,168 @@ +--- +layout: default +title: kinesis +parent: Sources +grand_parent: Pipelines +nav_order: 45 +--- + +# kinesis + +You can use the Data Prepper `kinesis` source to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/). + +## Usage + +The following example pipeline specifies Kinesis as a source. The pipeline ingests data from multiple Kinesis data streams named `stream1` and `stream2` and sets the `initial_position` to indicate the starting point for reading the stream records: + +```yaml +version: "2" +kinesis-pipeline: + source: + kinesis: + streams: + - stream_name: "stream1" + initial_position: "LATEST" + - stream_name: "stream2" + initial_position: "LATEST" + aws: + region: "us-west-2" + sts_role_arn: "arn:aws:iam::123456789012:role/my-iam-role" +``` + +## Configuration options + +The `kinesis` source supports the following configuration options. + +Option | Required | Type | Description +:--- |:---------|:---------| :--- +`aws` | Yes | AWS | Specifies the AWS configuration. See [`aws`](#aws). +`acknowledgments` | No | Boolean | When set to `true`, enables the `kinesis` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) when events are received by OpenSearch sinks. +`streams` | Yes | List | Configures a list of multiple Kinesis data streams that the `kinesis` source uses to read records. You can configure up to four streams. See [Streams](#streams). +`codec` | Yes | Codec | Specifies the [codec](#codec) to apply. +`buffer_timeout` | No | Duration | Sets the amount of time allowed for writing events to the Data Prepper buffer before timeout occurs. Any events that the source cannot write to the buffer during the specified amount of time are discarded. Default is `1s`. +`records_to_accumulate` | No | Integer | Determines the number of messages that accumulate before being written to the buffer. Default is `100`. +`consumer_strategy` | No | String | Selects the consumer strategy to use for ingesting Kinesis data streams. The default is `fan-out`, but `polling` can also be used. If `polling` is enabled, the additional configuration is required. +`polling` | No | polling | See [polling](#polling). + +### Streams + +You can use the following options in the `streams` array. + +Option | Required | Type | Description +:--- |:---------| :--- | :--- +`stream_name` | Yes | String | Defines the name of each Kinesis data stream. +`initial_position` | No | String | Sets the `initial_position` to determine at what point the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`. +`checkpoint_interval` | No | Duration | Configure the `checkpoint_interval` to periodically checkpoint Kinesis data streams and avoid duplication of record processing. Default is `PT2M`. +`compression` | No | String | Specifies the compression format. To decompress records added by a [CloudWatch Logs Subscription Filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format. + +## codec + +The `codec` determines how the `kinesis` source parses each Kinesis stream record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors. + +### json codec + +The `json` codec parses each single line as a single JSON object from a JSON array and then creates a Data Prepper event for each object in the array. It can be used for parsing nested CloudWatch events into individual log entries. +It also supports the below configuration to use with this codec. + +Option | Required | Type | Description +:--- | :--- |:--------| :--- +`key_name` | No | String | The name of the input field from which to extract the JSON array and create Data Prepper events. +`include_keys` | No | List | The list of input fields to be extracted and added as additional fields in the Data Prepper event. +`include_keys_metadata` | No | List | The list of input fields to be extracted and added to the Data Prepper event metadata object. + +### `newline` codec + +The `newline` codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line. + +You can use the following options to configure the `newline` codec. + +Option | Required | Type | Description +:--- | :--- |:--------| :--- +`skip_lines` | No | Integer | Sets the number of lines to skip before creating events. You can use this configuration to skip common header rows. Default is `0`. +`header_destination` | No | String | Defines a key value to assign to the header line of the stream event. If this option is specified, then each event will contain a `header_destination` field. + +### polling + +When the `consumer_strategy` is set to `polling`, the `kinesis` source uses a polling-based approach to read records from the Kinesis data streams, instead of the default `fan-out` approach. + +Option | Required | Type | Description +:--- | :--- |:--------| :--- +`max_polling_records` | No | Integer | Sets the number of records to fetch from Kinesis during a single call. +`idle_time_between_reads` | No | Duration | Defines the amount of idle time between calls. + +### aws + +You can use the following options in the `aws` configuration. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`region` | No | String | Sets the AWS Region to use for credentials. Defaults to the [standard SDK behavior for determining the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). +`sts_role_arn` | No | String | Defines the AWS Security Token Service (AWS STS) role to assume for requests to Amazon Kinesis Data Streams and Amazon DynamoDB. Defaults to `null`, which uses the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). +`aws_sts_header_overrides` | No | Map | Defines a map of header overrides that the AWS Identity and Access Management (IAM) role assumes for the sink plugin. + +## Exposed metadata attributes + +The `kinesis` source adds the following metadata to each processed event. You can access the metadata attributes using the [expression syntax `getMetadata` function]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/). + +- `stream_name`: Contains the name of the Kinesis data stream from which the event was obtained. + +## Permissions + +The following minimum permissions are required in order to run `kinesis` as a source: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "kinesis:DescribeStream", + "kinesis:DescribeStreamConsumer", + "kinesis:DescribeStreamSummary", + "kinesis:GetRecords", + "kinesis:GetShardIterator", + "kinesis:ListShards", + "kinesis:ListStreams", + "kinesis:ListStreamConsumers", + "kinesis:RegisterStreamConsumer", + "kinesis:SubscribeToShard" + ], + "Resource": [ + "arn:aws:kinesis:us-east-1:{account-id}:stream/stream1", + "arn:aws:kinesis:us-east-1:{account-id}:stream/stream2" + ] + }, + { + "Sid": "allowCreateTable", + "Effect": "Allow", + "Action": [ + "dynamodb:CreateTable", + "dynamodb:PutItem", + "dynamodb:DescribeTable", + "dynamodb:DeleteItem", + "dynamodb:GetItem", + "dynamodb:Scan", + "dynamodb:UpdateItem", + "dynamodb:Query" + ], + "Resource": [ + "arn:aws:dynamodb:us-east-1:{account-id}:table/kinesis-pipeline" + ] + } + ] +} +``` + +The `kinesis` source uses a DynamoDB table for ingestion coordination among multiple workers, so you need DynamoDB permissions. + +## Metrics + +The `kinesis` source includes the following metrics. + +### Counters + +* `recordsProcessed`: Counts the number of processed stream records. +* `recordProcessingErrors`: Counts the number of stream record processing errors. +* `acknowledgementSetSuccesses`: Counts the number of processed stream records that were successfully added to the sink. +* `acknowledgementSetFailures`: Counts the number of processed stream records that failed to be added to the sink. diff --git a/_data-prepper/pipelines/configuration/sources/opensearch.md b/_data-prepper/pipelines/configuration/sources/opensearch.md index 7cc0b9a36a..1ee2237575 100644 --- a/_data-prepper/pipelines/configuration/sources/opensearch.md +++ b/_data-prepper/pipelines/configuration/sources/opensearch.md @@ -3,7 +3,7 @@ layout: default title: opensearch parent: Sources grand_parent: Pipelines -nav_order: 30 +nav_order: 50 --- # opensearch @@ -135,7 +135,7 @@ Option | Required | Type | Description ### Scheduling -The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the the `index_read_count` and recount time `interval`. +The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the `index_read_count` and recount time `interval`. For example, setting `index_read_count` to `3` with an `interval` of `1h` will result in all indexes being reprocessed 3 times, 1 hour apart. By default, indexes will only be processed once. diff --git a/_data-prepper/pipelines/configuration/sources/otel-logs-source.md b/_data-prepper/pipelines/configuration/sources/otel-logs-source.md index 068369efaf..38095d7d7f 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-logs-source.md +++ b/_data-prepper/pipelines/configuration/sources/otel-logs-source.md @@ -3,7 +3,7 @@ layout: default title: otel_logs_source parent: Sources grand_parent: Pipelines -nav_order: 25 +nav_order: 60 --- # otel_logs_source diff --git a/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md b/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md index bea74a96d3..0e8d377828 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md +++ b/_data-prepper/pipelines/configuration/sources/otel-metrics-source.md @@ -3,7 +3,7 @@ layout: default title: otel_metrics_source parent: Sources grand_parent: Pipelines -nav_order: 10 +nav_order: 70 --- # otel_metrics_source diff --git a/_data-prepper/pipelines/configuration/sources/otel-trace-source.md b/_data-prepper/pipelines/configuration/sources/otel-trace-source.md index 1be7864c33..de45a5de63 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-trace-source.md +++ b/_data-prepper/pipelines/configuration/sources/otel-trace-source.md @@ -3,7 +3,7 @@ layout: default title: otel_trace_source parent: Sources grand_parent: Pipelines -nav_order: 15 +nav_order: 80 redirect_from: - /data-prepper/pipelines/configuration/sources/otel-trace/ --- diff --git a/_data-prepper/pipelines/configuration/sources/pipeline.md b/_data-prepper/pipelines/configuration/sources/pipeline.md new file mode 100644 index 0000000000..6ba025bd18 --- /dev/null +++ b/_data-prepper/pipelines/configuration/sources/pipeline.md @@ -0,0 +1,31 @@ +--- +layout: default +title: pipeline +parent: Sources +grand_parent: Pipelines +nav_order: 90 +--- + +# pipeline + +Use the `pipeline` sink to read from another pipeline. + +## Configuration options + +The `pipeline` sink supports the following configuration options. + +| Option | Required | Type | Description | +|:-------|:---------|:-------|:---------------------------------------| +| `name` | Yes | String | The name of the pipeline to read from. | + +## Usage + +The following example configures a `pipeline` sink named `sample-pipeline` that reads from a pipeline named `movies`: + +```yaml +sample-pipeline: + source: + - pipeline: + name: "movies" +``` +{% include copy.html %} diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 5a7d9986e5..db92718a36 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -3,7 +3,7 @@ layout: default title: s3 source parent: Sources grand_parent: Pipelines -nav_order: 20 +nav_order: 100 --- # s3 source diff --git a/_data-prepper/pipelines/configuration/sources/sources.md b/_data-prepper/pipelines/configuration/sources/sources.md index 811b161e16..682f215517 100644 --- a/_data-prepper/pipelines/configuration/sources/sources.md +++ b/_data-prepper/pipelines/configuration/sources/sources.md @@ -3,7 +3,7 @@ layout: default title: Sources parent: Pipelines has_children: true -nav_order: 20 +nav_order: 110 --- # Sources diff --git a/_data/versions.json b/_data/versions.json index 4f7e55c21b..c14e91fa0c 100644 --- a/_data/versions.json +++ b/_data/versions.json @@ -1,10 +1,11 @@ { - "current": "2.16", + "current": "2.17", "all": [ - "2.16", + "2.17", "1.3" ], "archived": [ + "2.16", "2.15", "2.14", "2.13", @@ -25,7 +26,7 @@ "1.1", "1.0" ], - "latest": "2.16" + "latest": "2.17" } diff --git a/_field-types/index.md b/_field-types/index.md index 7a7e816ada..e9250f409d 100644 --- a/_field-types/index.md +++ b/_field-types/index.md @@ -12,43 +12,77 @@ redirect_from: # Mappings and field types -You can define how documents and their fields are stored and indexed by creating a _mapping_. The mapping specifies the list of fields for a document. Every field in the document has a _field type_, which defines the type of data the field contains. For example, you may want to specify that the `year` field should be of type `date`. To learn more, see [Supported field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/). +Mappings tell OpenSearch how to store and index your documents and their fields. You can specify the data type for each field (for example, `year` as `date`) to make storage and querying more efficient. -If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. +While [dynamic mappings](#dynamic-mapping) automatically add new data and fields, using explicit mappings is recommended. Explicit mappings let you define the exact structure and data types upfront. This helps to maintain data consistency and optimize performance, especially for large datasets or high-volume indexing operations. -For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. By using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers. +For example, with explicit mappings, you can ensure that `year` is treated as text and `age` as an integer instead of both being interpreted as integers by dynamic mapping. -This section provides an example for how to create an index mapping and how to add a document to it that will get ip_range validated. - -#### Table of contents -1. TOC -{:toc} - - ---- ## Dynamic mapping When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping. -#### Dynamic mapping types +### Dynamic mapping types Type | Description :--- | :--- -null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values. -boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.` -float | A single-precision 32-bit floating point number. -double | A double-precision 64-bit floating point number. -integer | A signed 32-bit number. -object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. -array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values. -text | A string sequence of characters that represent full-text values. -keyword | A string sequence of structured characters, such as an email address or ZIP code. +`null` | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if the field has no value. +`boolean` | OpenSearch accepts `true` and `false` as Boolean values. An empty string is equal to `false.` +`float` | A single-precision, 32-bit floating-point number. +`double` | A double-precision, 64-bit floating-point number. +`integer` | A signed 32-bit number. +`object` | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. +`array` | OpenSearch does not have a specific array data type. Arrays are represented as a set of values of the same data type (for example, integers or strings) associated with a field. When indexing, you can pass multiple values for a field, and OpenSearch will treat it as an array. Empty arrays are valid and recognized as array fields with zero elements---not as fields with no values. OpenSearch supports querying and filtering arrays, including checking for values, range queries, and array operations like concatenation and intersection. Nested arrays, which may contain complex objects or other arrays, can also be used for advanced data modeling. +`text` | A string sequence of characters that represent full-text values. +`keyword` | A string sequence of structured characters, such as an email address or ZIP code. date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date. numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled. +### Dynamic templates + +Dynamic templates are used to define custom mappings for dynamically added fields based on the data type, field name, or field path. They allow you to define a flexible schema for your data that can automatically adapt to changes in the structure or format of the input data. + +You can use the following syntax to define a dynamic mapping template: + +```json +PUT index +{ + "mappings": { + "dynamic_templates": [ + { + "fields": { + "mapping": { + "type": "short" + }, + "match_mapping_type": "string", + "path_match": "status*" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +This mapping configuration dynamically maps any field with a name starting with `status` (for example, `status_code`) to the `short` data type if the initial value provided during indexing is a string. + +### Dynamic mapping parameters + +The `dynamic_templates` support the following parameters for matching conditions and mapping rules. The default value is `null`. + +Parameter | Description | +----------|-------------| +`match_mapping_type` | Specifies the JSON data type (for example, string, long, double, object, binary, Boolean, date) that triggers the mapping. +`match` | A regular expression used to match field names and apply the mapping. +`unmatch` | A regular expression used to exclude field names from the mapping. +`match_pattern` | Determines the pattern matching behavior, either `regex` or `simple`. Default is `simple`. +`path_match` | Allows you to match nested field paths using a regular expression. +`path_unmatch` | Excludes nested field paths from the mapping using a regular expression. +`mapping` | The mapping configuration to apply. + ## Explicit mapping -If you know exactly what your field data types need to be, you can specify them in your request body when creating your index. +If you know exactly which field data types you need to use, then you can specify them in your request body when creating your index, as shown in the following example request: ```json PUT sample-index1 @@ -62,8 +96,9 @@ PUT sample-index1 } } ``` +{% include copy-curl.html %} -### Response +#### Response ```json { "acknowledged": true, @@ -71,8 +106,9 @@ PUT sample-index1 "index": "sample-index1" } ``` +{% include copy-curl.html %} -To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method: +To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method, as shown in the following example request: ```json POST sample-index1/_mapping @@ -84,84 +120,29 @@ POST sample-index1/_mapping } } ``` +{% include copy-curl.html %} You cannot change the mapping of an existing field, you can only modify the field's mapping parameters. {: .note} ---- -## Mapping example usage +## Mapping parameters -The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You accomplish this by setting the `ignore_malformed` parameter to `true`. +Mapping parameters are used to configure the behavior of index fields. See [Mappings and field types]({{site.url}}{{site.baseurl}}/field-types/) for more information. -### Create an index with an `ip` mapping +## Mapping limit settings -To create an index, use a PUT request: +OpenSearch has certain mapping limits and settings, such as the settings listed in the following table. Settings can be configured based on your requirements. -```json -PUT /test-index -{ - "mappings" : { - "properties" : { - "ip_address" : { - "type" : "ip", - "ignore_malformed": true - } - } - } -} -``` - -You can add a document that has a malformed IP address to your index: - -```json -PUT /test-index/_doc/1 -{ - "ip_address" : "malformed ip address" -} -``` - -This indexed IP address does not throw an error because `ignore_malformed` is set to true. - -You can query the index using the following request: - -```json -GET /test-index/_search -``` +| Setting | Default value | Allowed value | Type | Description | +|-|-|-|-|-| +| `index.mapping.nested_fields.limit` | 50 | [0,) | Dynamic | Limits the maximum number of nested fields that can be defined in an index mapping. | +| `index.mapping.nested_objects.limit` | 10,000 | [0,) | Dynamic | Limits the maximum number of nested objects that can be created in a single document. | +| `index.mapping.total_fields.limit` | 1,000 | [0,) | Dynamic | Limits the maximum number of fields that can be defined in an index mapping. | +| `index.mapping.depth.limit` | 20 | [1,100] | Dynamic | Limits the maximum depth of nested objects and nested fields that can be defined in an index mapping. | +| `index.mapping.field_name_length.limit` | 50,000 | [1,50000] | Dynamic | Limits the maximum length of field names that can be defined in an index mapping. | +| `index.mapper.dynamic` | true | {true,false} | Dynamic | Determines whether new fields should be dynamically added to a mapping. | -The response shows that the `ip_address` field is ignored in the indexed document: - -```json -{ - "took": 14, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "test-index", - "_id": "1", - "_score": 1, - "_ignored": [ - "ip_address" - ], - "_source": { - "ip_address": "malformed ip address" - } - } - ] - } -} -``` +--- ## Get a mapping @@ -170,14 +151,16 @@ To get all mappings for one or more indexes, use the following request: ```json GET /_mapping ``` +{% include copy-curl.html %} -In the above request, `` may be an index name or a comma-separated list of index names. +In the previous request, `` may be an index name or a comma-separated list of index names. To get all mappings for all indexes, use the following request: ```json GET _mapping ``` +{% include copy-curl.html %} To get a mapping for a specific field, provide the index name and the field name: @@ -185,14 +168,14 @@ To get a mapping for a specific field, provide the index name and the field name GET _mapping/field/ GET //_mapping/field/ ``` +{% include copy-curl.html %} -Both `` and `` can be specified as one value or a comma-separated list. - -For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`: +Both `` and `` can be specified as either one value or a comma-separated list. For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`: ```json GET sample-index1/_mapping/field/year,age ``` +{% include copy-curl.html %} The response contains the specified fields: @@ -220,3 +203,8 @@ The response contains the specified fields: } } ``` +{% include copy-curl.html %} + +## Mappings use cases + +See [Mappings use cases]({{site.url}}{{site.baseurl}}/field-types/mappings-use-cases/) for use case examples, including examples of mapping string fields and ignoring malformed IP addresses. diff --git a/_field-types/mapping-parameters/analyzer.md b/_field-types/mapping-parameters/analyzer.md new file mode 100644 index 0000000000..32b26da1e0 --- /dev/null +++ b/_field-types/mapping-parameters/analyzer.md @@ -0,0 +1,90 @@ +--- +layout: default +title: Analyzer +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 5 +has_children: false +has_toc: false +--- + +# Analyzer + +The `analyzer` mapping parameter is used to define the text analysis process that applies to a text field during both index and search operations. + +The key functions of the `analyzer` mapping parameter are: + +1. **Tokenization:** The analyzer determines how the text is broken down into individual tokens (words, numbers) that can be indexed and searched. Each generated token must not exceed 32,766 bytes in order to avoid indexing failures. + +2. **Normalization:** The analyzer can apply various normalization techniques, such as converting text to lowercase, removing stop words, and stemming/lemmatizing words. + +3. **Consistency:** By defining the same analyzer for both index and search operations, you ensure that the text analysis process is consistent, which helps improve the relevance of search results. + +4. **Customization:** OpenSearch allows you to define custom analyzers by specifying the tokenizer, character filters, and token filters to be used. This gives you fine-grained control over the text analysis process. + +For information about specific analyzer parameters, such as `analyzer`, `search_analyzer`, or `search_quote_analyzer`, see [Search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). +{: .note} + +------------ + +## Example + +The following example configuration defines a custom analyzer called `my_custom_analyzer`: + +```json +PUT my_index +{ + "settings": { + "analysis": { + "analyzer": { + "my_custom_analyzer": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "my_stop_filter", + "my_stemmer" + ] + } + }, + "filter": { + "my_stop_filter": { + "type": "stop", + "stopwords": ["the", "a", "and", "or"] + }, + "my_stemmer": { + "type": "stemmer", + "language": "english" + } + } + } + }, + "mappings": { + "properties": { + "my_text_field": { + "type": "text", + "analyzer": "my_custom_analyzer", + "search_analyzer": "standard", + "search_quote_analyzer": "my_custom_analyzer" + } + } + } +} +``` +{% include copy-curl.html %} + +In this example, the `my_custom_analyzer` uses the standard tokenizer, converts all tokens to lowercase, applies a custom stop word filter, and applies an English stemmer. + +You can then map a text field so that it uses this custom analyzer for both index and search operations: + +```json +"mappings": { + "properties": { + "my_text_field": { + "type": "text", + "analyzer": "my_custom_analyzer" + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/mapping-parameters/boost.md b/_field-types/mapping-parameters/boost.md new file mode 100644 index 0000000000..f1648a861d --- /dev/null +++ b/_field-types/mapping-parameters/boost.md @@ -0,0 +1,50 @@ +--- +layout: default +title: Boost +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 10 +has_children: false +has_toc: false +--- + +# Boost + +The `boost` mapping parameter is used to increase or decrease the relevance score of a field during search queries. It allows you to apply more or less weight to specific fields when calculating the overall relevance score of a document. + +The `boost` parameter is applied as a multiplier to the score of a field. For example, if a field has a `boost` value of `2`, then the score contribution of that field is doubled. Conversely, a `boost` value of `0.5` would halve the score contribution of that field. + +----------- + +## Example + +The following is an example of how you can use the `boost` parameter in an OpenSearch mapping: + +```json +PUT my-index1 +{ + "mappings": { + "properties": { + "title": { + "type": "text", + "boost": 2 + }, + "description": { + "type": "text", + "boost": 1 + }, + "tags": { + "type": "keyword", + "boost": 1.5 + } + } + } +} +``` +{% include copy-curl.html %} + +In this example, the `title` field has a boost of `2`, which means that it contributes twice as much to the overall relevance score than the description field (which has a boost of `1`). The `tags` field has a boost of `1.5`, so it contributes one and a half times more than the description field. + +The `boost` parameter is particularly useful when you want to apply more weight to certain fields. For example, you might want to boost the `title` field more than the `description` field because the title may be a better indicator of the document's relevance. + +The `boost` parameter is a multiplicative factor---not an additive one. This means that a field with a higher boost value will have a disproportionately large effect on the overall relevance score as compared to fields with lower boost values. When using the `boost` parameter, it is recommended that you start with small values (1.5 or 2) and test the effect on your search results. Overly high boost values can skew the relevance scores and lead to unexpected or undesirable search results. diff --git a/_field-types/mapping-parameters/coerce.md b/_field-types/mapping-parameters/coerce.md new file mode 100644 index 0000000000..3cf844897a --- /dev/null +++ b/_field-types/mapping-parameters/coerce.md @@ -0,0 +1,100 @@ +--- +layout: default +title: Coerce +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 15 +has_children: false +has_toc: false +--- + +# Coerce + +The `coerce` mapping parameter controls how values are converted to the expected field data type during indexing. This parameter lets you verify that your data is formatted and indexed properly, following the expected field types. This improves the accuracy of your search results. + +--- + +## Examples + +The following examples demonstrate how to use the `coerce` mapping parameter. + +#### Indexing a document with `coerce` enabled + +```json +PUT products +{ + "mappings": { + "properties": { + "price": { + "type": "integer", + "coerce": true + } + } + } +} + +PUT products/_doc/1 +{ + "name": "Product A", + "price": "19.99" +} +``` +{% include copy-curl.html %} + +In this example, the `price` field is defined as an `integer` type with `coerce` set to `true`. When indexing the document, the string value `19.99` is coerced to the integer `19`. + +#### Indexing a document with `coerce` disabled + +```json +PUT orders +{ + "mappings": { + "properties": { + "quantity": { + "type": "integer", + "coerce": false + } + } + } +} + +PUT orders/_doc/1 +{ + "item": "Widget", + "quantity": "10" +} +``` +{% include copy-curl.html %} + +In this example, the `quantity` field is defined as an `integer` type with `coerce` set to `false`. When indexing the document, the string value `10` is not coerced, and the document is rejected because of the type mismatch. + +#### Setting the index-level coercion setting + +```json +PUT inventory +{ + "settings": { + "index.mapping.coerce": false + }, + "mappings": { + "properties": { + "stock_count": { + "type": "integer", + "coerce": true + }, + "sku": { + "type": "keyword" + } + } + } +} + +PUT inventory/_doc/1 +{ + "sku": "ABC123", + "stock_count": "50" +} +``` +{% include copy-curl.html %} + +In this example, the index-level `index.mapping.coerce` setting is set to `false`, which disables coercion for the index. However, the `stock_count` field overrides this setting and enables coercion for this specific field. diff --git a/_field-types/mapping-parameters/copy-to.md b/_field-types/mapping-parameters/copy-to.md new file mode 100644 index 0000000000..b029f814b5 --- /dev/null +++ b/_field-types/mapping-parameters/copy-to.md @@ -0,0 +1,109 @@ +--- +layout: default +title: Copy_to +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 20 +has_children: false +has_toc: false +--- + +# Copy_to + +The `copy_to` parameter allows you to copy the values of multiple fields into a single field. This parameter can be useful if you often search across multiple fields because it allows you to search the group field instead. + +Only the field value is copied and not the terms resulting from the analysis process. The original `_source` field remains unmodified, and the same value can be copied to multiple fields using the `copy_to` parameter. However, recursive copying through intermediary fields is not supported; instead, use `copy_to` directly from the originating field to multiple target fields. + +--- + +## Examples + +The following example uses the `copy_to` parameter to search for products by their name and description and copy those values into a single field: + +```json +PUT my-products-index +{ + "mappings": { + "properties": { + "name": { + "type": "text", + "copy_to": "product_info" + }, + "description": { + "type": "text", + "copy_to": "product_info" + }, + "product_info": { + "type": "text" + }, + "price": { + "type": "float" + } + } + } +} + +PUT my-products-index/_doc/1 +{ + "name": "Wireless Headphones", + "description": "High-quality wireless headphones with noise cancellation", + "price": 99.99 +} + +PUT my-products-index/_doc/2 +{ + "name": "Bluetooth Speaker", + "description": "Portable Bluetooth speaker with long battery life", + "price": 49.99 +} +``` +{% include copy-curl.html %} + +In this example, the values from the `name` and `description` fields are copied into the `product_info` field. You can now search for products by querying the `product_info` field, as follows: + +```json +GET my-products-index/_search +{ + "query": { + "match": { + "product_info": "wireless headphones" + } + } +} +``` +{% include copy-curl.html %} + +## Response + +```json +{ + "took": 20, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.9061546, + "hits": [ + { + "_index": "my-products-index", + "_id": "1", + "_score": 1.9061546, + "_source": { + "name": "Wireless Headphones", + "description": "High-quality wireless headphones with noise cancellation", + "price": 99.99 + } + } + ] + } +} +``` + diff --git a/_field-types/mapping-parameters/doc-values.md b/_field-types/mapping-parameters/doc-values.md new file mode 100644 index 0000000000..9ea9364c0b --- /dev/null +++ b/_field-types/mapping-parameters/doc-values.md @@ -0,0 +1,46 @@ +--- +layout: default +title: doc_values +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 25 +has_children: false +has_toc: false +--- + +# doc_values + +By default, OpenSearch indexes most fields for search purposes. The `doc_values ` parameter enables document-to-term lookups for operations such as sorting, aggregations, and scripting. + +The `doc_values` parameter accepts the following options. + +Option | Description +:--- | :--- +`true` | Enables `doc_values` for the field. Default is `true`. +`false` | Disables `doc_values` for the field. + +The `doc_values` parameter is not supported for use in text fields. + +--- + +## Example: Creating an index with `doc_values` enabled and disabled + +The following example request creates an index with `doc_values` enabled for one field and disabled for another: + +```json +PUT my-index-001 +{ + "mappings": { + "properties": { + "status_code": { + "type": "keyword" + }, + "session_id": { + "type": "keyword", + "doc_values": false + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/dynamic.md b/_field-types/mapping-parameters/dynamic.md similarity index 96% rename from _field-types/dynamic.md rename to _field-types/mapping-parameters/dynamic.md index 59f59bfe3d..2d48e98082 100644 --- a/_field-types/dynamic.md +++ b/_field-types/mapping-parameters/dynamic.md @@ -1,18 +1,22 @@ --- layout: default -title: Dynamic parameter -nav_order: 10 +title: Dynamic +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 30 +has_children: false +has_toc: false redirect_from: - /opensearch/dynamic/ --- -# Dynamic parameter +# Dynamic The `dynamic` parameter specifies whether newly detected fields can be added dynamically to a mapping. It accepts the parameters listed in the following table. Parameter | Description :--- | :--- -`true` | Specfies that new fields can be added dynamically to the mapping. Default is `true`. +`true` | Specifies that new fields can be added dynamically to the mapping. Default is `true`. `false` | Specifies that new fields cannot be added dynamically to the mapping. If a new field is detected, then it is not indexed or searchable but can be retrieved from the `_source` field. `strict` | Throws an exception. The indexing operation fails when new fields are detected. `strict_allow_templates` | Adds new fields if they match predefined dynamic templates in the mapping. diff --git a/_field-types/mapping-parameters/enabled.md b/_field-types/mapping-parameters/enabled.md new file mode 100644 index 0000000000..2695ff7529 --- /dev/null +++ b/_field-types/mapping-parameters/enabled.md @@ -0,0 +1,47 @@ +--- +layout: default +title: enabled +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 40 +has_children: false +has_toc: false +--- + +# Enabled + +The `enabled` parameter allows you to control whether OpenSearch parses the contents of a field. This parameter can be applied to the top-level mapping definition and to object fields. + +The `enabled` parameter accepts the following values. + +Parameter | Description +:--- | :--- +`true` | The field is parsed and indexed. Default is `true`. +`false` | The field is not parsed or indexed but is still retrievable from the `_source` field. When `enabled` is set to `false`, OpenSearch stores the field's value in the `_source` field but does not index or parse its contents. This can be useful for fields that you want to store but do not need to search, sort, or aggregate on. + +--- + +## Example: Using the `enabled` parameter + +In the following example request, the `session_data` field is disabled. OpenSearch stores its contents in the `_source` field but does not index or parse them: + +```json +PUT my-index-002 +{ + "mappings": { + "properties": { + "user_id": { + "type": "keyword" + }, + "last_updated": { + "type": "date" + }, + "session_data": { + "type": "object", + "enabled": false + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/mapping-parameters/index.md b/_field-types/mapping-parameters/index.md new file mode 100644 index 0000000000..ca5586bb8f --- /dev/null +++ b/_field-types/mapping-parameters/index.md @@ -0,0 +1,28 @@ +--- +layout: default +title: Mapping parameters +nav_order: 75 +has_children: true +has_toc: false +--- + +# Mapping parameters + +Mapping parameters are used to configure the behavior of index fields. For parameter use cases, see a mapping parameter's respective page. + +The following table lists OpenSearch mapping parameters. + +Parameter | Description +:--- | :--- +`analyzer` | Specifies the analyzer used to analyze string fields. Default is the `standard` analyzer, which is a general-purpose analyzer that splits text on white space and punctuation, converts to lowercase, and removes stop words. Allowed values are `standard`, `simple`, and `whitespace`. +`boost` | Specifies a field-level boost factor applied at query time. Allows you to increase or decrease the relevance score of a specific field during search queries. Default boost value is `1.0`, which means no boost is applied. Allowed values are any positive floating-point number. +`coerce` | Controls how values are converted to the expected field data type during indexing. Default value is `true`, which means that OpenSearch tries to coerce the value to the expected value type. Allowed values are `true` or `false`. +`copy_to` | Copies the value of a field to another field. There is no default value for this parameter. Optional. +`doc_values` | Specifies whether a field should be stored on disk to make sorting and aggregation faster. Default value is `true`, which means that the doc values are enabled. Allowed values are a single field name or a list of field names. +`dynamic` | Determines whether new fields should be added dynamically. Default value is `true`, which means that new fields can be added dynamically. Allowed values are `true`, `false`, or `strict`. +`enabled` | Specifies whether the field is enabled or disabled. Default value is `true`, which means that the field is enabled. Allowed values are `true` or `false`. +`format` | Specifies the date format for date fields. There is no default value for this parameter. Allowed values are any valid date format string, such as `yyyy-MM-dd` or `epoch_millis`. +`ignore_above` | Skips indexing values that exceed the specified length. Default value is `2147483647`, which means that there is no limit on the field value length. Allowed values are any positive integer. +`ignore_malformed` | Specifies whether malformed values should be ignored. Default value is `false`, which means that malformed values are not ignored. Allowed values are `true` or `false`. +`index` | Specifies whether a field should be indexed. Default value is `true`, which means that the field is indexed. Allowed values are `true`, `false`, or `not_analyzed`. +`index_options` | Specifies what information should be stored in an index for scoring purposes. Default value is `docs`, which means that only the document numbers are stored in the index. Allowed values are `docs`, `freqs`, `positions`, or `offsets`. \ No newline at end of file diff --git a/_field-types/mappings-use-cases.md b/_field-types/mappings-use-cases.md new file mode 100644 index 0000000000..835e030bab --- /dev/null +++ b/_field-types/mappings-use-cases.md @@ -0,0 +1,122 @@ +--- +layout: default +title: Mappings use cases +parent: Mappings and fields types +nav_order: 5 +nav_exclude: true +--- + +# Mappings use cases + +Mappings provide control over how data is indexed and queried, enabling optimized performance and efficient storage for a range of use cases. + +--- + +## Example: Ignoring malformed IP addresses + +The following example shows you how to create a mapping specifying that OpenSearch should ignore any documents containing malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You can accomplish this by setting the `ignore_malformed` parameter to `true`. + +### Create an index with an `ip` mapping + +To create an index with an `ip` mapping, use a PUT request: + +```json +PUT /test-index +{ + "mappings" : { + "properties" : { + "ip_address" : { + "type" : "ip", + "ignore_malformed": true + } + } + } +} +``` +{% include copy-curl.html %} + +Then add a document with a malformed IP address: + +```json +PUT /test-index/_doc/1 +{ + "ip_address" : "malformed ip address" +} +``` +{% include copy-curl.html %} + +When you query the index, the `ip_address` field will be ignored. You can query the index using the following request: + +```json +GET /test-index/_search +``` +{% include copy-curl.html %} + +#### Response + +```json +{ + "took": 14, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-index", + "_id": "1", + "_score": 1, + "_ignored": [ + "ip_address" + ], + "_source": { + "ip_address": "malformed ip address" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +--- + +## Mapping string fields to `text` and `keyword` types + +To create an index named `movies1` with a dynamic template that maps all string fields to both the `text` and `keyword` types, you can use the following request: + +```json +PUT movies1 +{ + "mappings": { + "dynamic_templates": [ + { + "strings": { + "match_mapping_type": "string", + "mapping": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + } + ] + } +} +``` +{% include copy-curl.html %} + +This dynamic template ensures that any string fields in your documents will be indexed as both a full-text `text` type and a `keyword` type. diff --git a/_field-types/metadata-fields/field-names.md b/_field-types/metadata-fields/field-names.md new file mode 100644 index 0000000000..b17e94fbb4 --- /dev/null +++ b/_field-types/metadata-fields/field-names.md @@ -0,0 +1,43 @@ +--- +layout: default +title: Field names +nav_order: 10 +parent: Metadata fields +--- + +# Field names + +The `_field_names` field indexes field names that contain non-null values. This enables the use of the `exists` query, which can identify documents that either have or do not have non-null values for a specified field. + +However, `_field_names` only indexes field names when both `doc_values` and `norms` are disabled. If either `doc_values` or `norms` are enabled, then the `exists` query still functions but will not rely on the `_field_names` field. + +## Mapping example + +```json +{ + "mappings": { + "_field_names": { + "enabled": "true" + }, + "properties": { + }, + "title": { + "type": "text", + "doc_values": false, + "norms": false + }, + "description": { + "type": "text", + "doc_values": true, + "norms": false + }, + "price": { + "type": "float", + "doc_values": false, + "norms": true + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/id.md b/_field-types/metadata-fields/id.md new file mode 100644 index 0000000000..f66f4b8e13 --- /dev/null +++ b/_field-types/metadata-fields/id.md @@ -0,0 +1,86 @@ +--- +layout: default +title: ID +nav_order: 20 +parent: Metadata fields +--- + +# ID + +Each document in OpenSearch has a unique `_id` field. This field is indexed, allowing you to retrieve documents using the GET API or the [`ids` query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/). + +If you do not provide an `_id` value, then OpenSearch automatically generates one for the document. +{: .note} + +The following example request creates an index named `test-index1` and adds two documents with different `_id` values: + +```json +PUT test-index1/_doc/1 +{ + "text": "Document with ID 1" +} + +PUT test-index1/_doc/2?refresh=true +{ + "text": "Document with ID 2" +} +``` +{% include copy-curl.html %} + +You can then query the documents using the `_id` field, as shown in the following example request: + +```json +GET test-index1/_search +{ + "query": { + "terms": { + "_id": ["1", "2"] + } + } +} +``` +{% include copy-curl.html %} + +The response returns both documents with `_id` values of `1` and `2`: + +```json +{ + "took": 10, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-index1", + "_id": "1", + "_score": 1, + "_source": { + "text": "Document with ID 1" + } + }, + { + "_index": "test-index1", + "_id": "2", + "_score": 1, + "_source": { + "text": "Document with ID 2" + } + } + ] + } +``` +{% include copy-curl.html %} + +## Limitations of the `_id` field + +While the `_id` field can be used in various queries, it is restricted from use in aggregations, sorting, and scripting. If you need to sort or aggregate on the `_id` field, it is recommended to duplicate the `_id` content into another field with `doc_values` enabled. Refer to [IDs query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/) for an example. diff --git a/_field-types/metadata-fields/ignored.md b/_field-types/metadata-fields/ignored.md new file mode 100644 index 0000000000..e867cfc754 --- /dev/null +++ b/_field-types/metadata-fields/ignored.md @@ -0,0 +1,147 @@ +--- +layout: default +title: Ignored +nav_order: 25 +parent: Metadata fields +--- + +# Ignored + +The `_ignored` field helps you manage issues related to malformed data in your documents. This field is used to index and store field names that were ignored during the indexing process as a result of the `ignore_malformed` setting being enabled in the [index mapping]({{site.url}}{{site.baseurl}}/field-types/). + +The `_ignored` field allows you to search for and identify documents containing fields that were ignored as well as for the specific field names that were ignored. This can be useful for troubleshooting. + +You can query the `_ignored` field using the `term`, `terms`, and `exists` queries, and the results will be included in the search hits. + +The `_ignored` field is only populated when the `ignore_malformed` setting is enabled in your index mapping. If `ignore_malformed` is set to `false` (the default value), then malformed fields will cause the entire document to be rejected, and the `_ignored` field will not be populated. +{: .note} + +The following example request shows you how to use the `_ignored` field: + +```json +GET _search +{ + "query": { + "exists": { + "field": "_ignored" + } + } +} +``` +{% include copy-curl.html %} + +--- + +#### Example indexing request with the `_ignored` field + +The following example request adds a new document to the `test-ignored` index with `ignore_malformed` set to `true` so that no error is thrown during indexing: + +```json +PUT test-ignored +{ + "mappings": { + "properties": { + "title": { + "type": "text" + }, + "length": { + "type": "long", + "ignore_malformed": true + } + } + } +} + +POST test-ignored/_doc +{ + "title": "correct text", + "length": "not a number" +} + +GET test-ignored/_search +{ + "query": { + "exists": { + "field": "_ignored" + } + } +} +``` +{% include copy-curl.html %} + +#### Example reponse + +```json +{ + "took": 42, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-ignored", + "_id": "qcf0wZABpEYH7Rw9OT7F", + "_score": 1, + "_ignored": [ + "length" + ], + "_source": { + "title": "correct text", + "length": "not a number" + } + } + ] + } +} +``` + +--- + +## Ignoring a specified field + +You can use a `term` query to find documents in which a specific field was ignored, as shown in the following example request: + +```json +GET _search +{ + "query": { + "term": { + "_ignored": "created_at" + } + } +} +``` +{% include copy-curl.html %} + +#### Reponse + +```json +{ + "took": 51, + "timed_out": false, + "_shards": { + "total": 45, + "successful": 45, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + } +} +``` diff --git a/_field-types/metadata-fields/index-metadata.md b/_field-types/metadata-fields/index-metadata.md new file mode 100644 index 0000000000..657f7d62a5 --- /dev/null +++ b/_field-types/metadata-fields/index-metadata.md @@ -0,0 +1,86 @@ +--- +layout: default +title: Index +nav_order: 25 +parent: Metadata fields +--- + +# Index + +When querying across multiple indexes, you may need to filter results based on the index into which a document was indexed. The `index` field matches documents based on their index. + +The following example request creates two indexes, `products` and `customers`, and adds a document to each index: + +```json +PUT products/_doc/1 +{ + "name": "Widget X" +} + +PUT customers/_doc/2 +{ + "name": "John Doe" +} +``` +{% include copy-curl.html %} + +You can then query both indexes and filter the results using the `_index` field, as shown in the following example request: + +```json +GET products,customers/_search +{ + "query": { + "terms": { + "_index": ["products", "customers"] + } + }, + "aggs": { + "index_groups": { + "terms": { + "field": "_index", + "size": 10 + } + } + }, + "sort": [ + { + "_index": { + "order": "desc" + } + } + ], + "script_fields": { + "index_name": { + "script": { + "lang": "painless", + "source": "doc['_index'].value" + } + } + } +} +``` +{% include copy-curl.html %} + +In this example: + +- The `query` section uses a `terms` query to match documents from the `products` and `customers` indexes. +- The `aggs` section performs a `terms` aggregation on the `_index` field, grouping the results by index. +- The `sort` section sorts the results by the `_index` field in ascending order. +- The `script_fields` section adds a new field called `index_name` to the search results containing the `_index` field value for each document. + +## Querying on the `_index` field + +The `_index` field represents the index into which a document was indexed. You can use this field in your queries to filter, aggregate, sort, or retrieve index information for your search results. + +Because the `_index` field is automatically added to every document, you can use it in your queries like any other field. For example, you can use the `terms` query to match documents from multiple indexes. The following example query returns all documents from the `products` and `customers` indexes: + +```json + { + "query": { + "terms": { + "_index": ["products", "customers"] + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/index.md b/_field-types/metadata-fields/index.md new file mode 100644 index 0000000000..cdc079e1e5 --- /dev/null +++ b/_field-types/metadata-fields/index.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Metadata fields +nav_order: 90 +has_children: true +has_toc: false +--- + +# Metadata fields + +OpenSearch provides built-in metadata fields that allow you to access information about the documents in an index. These fields can be used in your queries as needed. + +Metadata field | Description +:--- | :--- +`_field_names` | The document fields with non-empty or non-null values. +`_ignored` | The document fields that were ignored during the indexing process due to the presence of malformed data, as specified by the `ignore_malformed` setting. +`_id` | The unique identifier assigned to each document. +`_index` | The index in which the document is stored. +`_meta` | Stores custom metadata or additional information specific to the application or use case. +`_routing` | Allows you to specify a custom value that determines the shard assignment for a document in an OpenSearch cluster. +`_source` | Contains the original JSON representation of the document data. diff --git a/_field-types/metadata-fields/meta.md b/_field-types/metadata-fields/meta.md new file mode 100644 index 0000000000..220d58f106 --- /dev/null +++ b/_field-types/metadata-fields/meta.md @@ -0,0 +1,87 @@ +--- +layout: default +title: Meta +nav_order: 30 +parent: Metadata fields +--- + +# Meta + +The `_meta` field is a mapping property that allows you to attach custom metadata to your index mappings. This metadata can be used by your application to store information relevant to your use case, such as versioning, ownership, categorization, or auditing. + +## Usage + +You can define the `_meta` field when creating a new index or updating an existing index's mapping, as shown in the following example request: + +```json +PUT my-index +{ + "mappings": { + "_meta": { + "application": "MyApp", + "version": "1.2.3", + "author": "John Doe" + }, + "properties": { + "title": { + "type": "text" + }, + "description": { + "type": "text" + } + } + } +} + +``` +{% include copy-curl.html %} + +In this example, three custom metadata fields are added: `application`, `version`, and `author`. These fields can be used by your application to store any relevant information about the index, such as the application it belongs to, the application version, or the author of the index. + +You can update the `_meta` field using the [Put Mapping API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/put-mapping/) operation, as shown in the following example request: + +```json +PUT my-index/_mapping +{ + "_meta": { + "application": "MyApp", + "version": "1.3.0", + "author": "Jane Smith" + } +} +``` +{% include copy-curl.html %} + +## Retrieving `meta` information + +You can retrieve the `_meta` information for an index using the [Get Mapping API]({{site.url}}{{site.baseurl}}/field-types/#get-a-mapping) operation, as shown in the following example request: + +```json +GET my-index/_mapping +``` +{% include copy-curl.html %} + +The response returns the full index mapping, including the `_meta` field: + +```json +{ + "my-index": { + "mappings": { + "_meta": { + "application": "MyApp", + "version": "1.3.0", + "author": "Jane Smith" + }, + "properties": { + "description": { + "type": "text" + }, + "title": { + "type": "text" + } + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/routing.md b/_field-types/metadata-fields/routing.md new file mode 100644 index 0000000000..9064e20c49 --- /dev/null +++ b/_field-types/metadata-fields/routing.md @@ -0,0 +1,92 @@ +--- +layout: default +title: Routing +nav_order: 35 +parent: Metadata fields +--- + +# Routing + +OpenSearch uses a hashing algorithm to route documents to specific shards in an index. By default, the document's `_id` field is used as the routing value, but you can also specify a custom routing value for each document. + +## Default routing + +The following is the default OpenSearch routing formula. The `_routing` value is the document's `_id`. + +```json +shard_num = hash(_routing) % num_primary_shards +``` + +## Custom routing + +You can specify a custom routing value when indexing a document, as shown in the following example request: + +```json +PUT sample-index1/_doc/1?routing=JohnDoe1 +{ + "title": "This is a document" +} +``` +{% include copy-curl.html %} + +In this example, the document is routed using the value `JohnDoe1` instead of the default `_id`. + +You must provide the same routing value when retrieving, deleting, or updating the document, as shown in the following example request: + +```json +GET sample-index1/_doc/1?routing=JohnDoe1 +``` +{% include copy-curl.html %} + +## Querying by routing + +You can query documents based on their routing value by using the `_routing` field, as shown in the following example. This query only searches the shard(s) associated with the `JohnDoe1` routing value: + +```json +GET sample-index1/_search +{ + "query": { + "terms": { + "_routing": [ "JohnDoe1" ] + } + } +} +``` +{% include copy-curl.html %} + +## Required routing + +You can make custom routing required for all CRUD operations on an index, as shown in the following example request. If you try to index a document without providing a routing value, OpenSearch will throw an exception. + +```json +PUT sample-index2 +{ + "mappings": { + "_routing": { + "required": true + } + } +} +``` +{% include copy-curl.html %} + +## Routing to specific shards + +You can configure an index to route custom values to a subset of shards rather than a single shard. This is done by setting `index.routing_partition_size` at the time of index creation. The formula for calculating the shard is `shard_num = (hash(_routing) + hash(_id)) % routing_partition_size) % num_primary_shards`. + +The following example request routes documents to one of four shards in the index: + +```json +PUT sample-index3 +{ + "settings": { + "index.routing_partition_size": 4 + }, + "mappings": { + "_routing": { + "required": true + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/source.md b/_field-types/metadata-fields/source.md new file mode 100644 index 0000000000..c9e714f43c --- /dev/null +++ b/_field-types/metadata-fields/source.md @@ -0,0 +1,54 @@ +--- +layout: default +title: Source +nav_order: 40 +parent: Metadata fields +--- + +# Source + +The `_source` field contains the original JSON document body that was indexed. While this field is not searchable, it is stored so that the full document can be returned when executing fetch requests, such as `get` and `search`. + +## Disabling the field + +You can disable the `_source` field by setting the `enabled` parameter to `false`, as shown in the following example request: + +```json +PUT sample-index1 +{ + "mappings": { + "_source": { + "enabled": false + } + } +} +``` +{% include copy-curl.html %} + +Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. +{: .warning} + +## Including or excluding fields + +You can selectively control the contents of the `_source` field by using the `includes` and `excludes` parameters. This allows you to prune the stored `_source` field after it is indexed but before it is saved, as shown in the following example request: + +```json +PUT logs +{ + "mappings": { + "_source": { + "includes": [ + "*.count", + "meta.*" + ], + "excludes": [ + "meta.description", + "meta.other.*" + ] + } + } +} +``` +{% include copy-curl.html %} + +These fields are not stored in the `_source`, but you can still search them because the data remains indexed. diff --git a/_field-types/supported-field-types/alias.md b/_field-types/supported-field-types/alias.md index 29cc58885c..f1f6ae9ac8 100644 --- a/_field-types/supported-field-types/alias.md +++ b/_field-types/supported-field-types/alias.md @@ -10,6 +10,8 @@ redirect_from: --- # Alias field type +**Introduced 1.0** +{: .label .label-purple } An alias field type creates another name for an existing field. You can use aliases in the[search](#using-aliases-in-search-api-operations) and [field capabilities](#using-aliases-in-field-capabilities-api-operations) API operations, with some [exceptions](#exceptions). To set up an [alias](#alias-field), you need to specify the [original field](#original-field) name in the `path` parameter. diff --git a/_field-types/supported-field-types/binary.md b/_field-types/supported-field-types/binary.md index d6974ad4cf..bb257bf7ec 100644 --- a/_field-types/supported-field-types/binary.md +++ b/_field-types/supported-field-types/binary.md @@ -10,6 +10,8 @@ redirect_from: --- # Binary field type +**Introduced 1.0** +{: .label .label-purple } A binary field type contains a binary value in [Base64](https://en.wikipedia.org/wiki/Base64) encoding that is not searchable. @@ -50,5 +52,5 @@ The following table lists the parameters accepted by binary field types. All par Parameter | Description :--- | :--- -`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Optional. Default is `true`. -`store` | A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the _source field. Optional. Default is `false`. \ No newline at end of file +`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Optional. Default is `false`. +`store` | A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the _source field. Optional. Default is `false`. diff --git a/_field-types/supported-field-types/boolean.md b/_field-types/supported-field-types/boolean.md index 8233a45ad5..82cfdecf47 100644 --- a/_field-types/supported-field-types/boolean.md +++ b/_field-types/supported-field-types/boolean.md @@ -10,6 +10,8 @@ redirect_from: --- # Boolean field type +**Introduced 1.0** +{: .label .label-purple } A Boolean field type takes `true` or `false` values, or `"true"` or `"false"` strings. You can also pass an empty string (`""`) in place of a `false` value. diff --git a/_field-types/supported-field-types/completion.md b/_field-types/supported-field-types/completion.md index 85c803baa1..e6e392fb6d 100644 --- a/_field-types/supported-field-types/completion.md +++ b/_field-types/supported-field-types/completion.md @@ -11,6 +11,8 @@ redirect_from: --- # Completion field type +**Introduced 1.0** +{: .label .label-purple } A completion field type provides autocomplete functionality through a completion suggester. The completion suggester is a prefix suggester, so it matches the beginning of text only. A completion suggester creates an in-memory data structure, which provides faster lookups but leads to increased memory usage. You need to upload a list of all possible completions into the index before using this feature. diff --git a/_field-types/supported-field-types/constant-keyword.md b/_field-types/supported-field-types/constant-keyword.md index bf1e4afc70..4f9261f1a1 100644 --- a/_field-types/supported-field-types/constant-keyword.md +++ b/_field-types/supported-field-types/constant-keyword.md @@ -8,6 +8,8 @@ grand_parent: Supported field types --- # Constant keyword field type +**Introduced 2.14** +{: .label .label-purple } A constant keyword field uses the same value for all documents in the index. diff --git a/_field-types/supported-field-types/date-nanos.md b/_field-types/supported-field-types/date-nanos.md index 12399a69d4..eb569265fc 100644 --- a/_field-types/supported-field-types/date-nanos.md +++ b/_field-types/supported-field-types/date-nanos.md @@ -8,6 +8,8 @@ grand_parent: Supported field types --- # Date nanoseconds field type +**Introduced 1.0** +{: .label .label-purple } The `date_nanos` field type is similar to the [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) field type in that it holds a date. However, `date` stores the date in millisecond resolution, while `date_nanos` stores the date in nanosecond resolution. Dates are stored as `long` values that correspond to nanoseconds since the epoch. Therefore, the range of supported dates is approximately 1970--2262. diff --git a/_field-types/supported-field-types/date.md b/_field-types/supported-field-types/date.md index fb008d1512..8ca986219b 100644 --- a/_field-types/supported-field-types/date.md +++ b/_field-types/supported-field-types/date.md @@ -11,6 +11,8 @@ redirect_from: --- # Date field type +**Introduced 1.0** +{: .label .label-purple } A date in OpenSearch can be represented as one of the following: diff --git a/_field-types/supported-field-types/derived.md b/_field-types/supported-field-types/derived.md index 2ca00927d1..cb358f47f9 100644 --- a/_field-types/supported-field-types/derived.md +++ b/_field-types/supported-field-types/derived.md @@ -28,11 +28,11 @@ Despite the potential performance impact of query-time computations, the flexibi Currently, derived fields have the following limitations: -- **Aggregation, scoring, and sorting**: Not yet supported. +- **Scoring and sorting**: Not yet supported. +- **Aggregations**: Starting with OpenSearch 2.17, derived fields support most aggregation types. The following aggregations are not supported: geographic (geodistance, geohash grid, geohex grid, geotile grid, geobounds, geocentroid), significant terms, significant text, and scripted metric. - **Dashboard support**: These fields are not displayed in the list of available fields in OpenSearch Dashboards. However, you can still use them for filtering if you know the derived field name. - **Chained derived fields**: One derived field cannot be used to define another derived field. - **Join field type**: Derived fields are not supported for the [join field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/join/). -- **Concurrent segment search**: Derived fields are not supported for [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/). We are planning to address these limitations in future versions. @@ -69,7 +69,7 @@ PUT logs } } }, - "client_ip": { + "clientip": { "type": "keyword" } } @@ -541,6 +541,80 @@ The response specifies highlighting in the `url` field: ``` +## Aggregations + +Starting with OpenSearch 2.17, derived fields support most aggregation types. + +Geographic, significant terms, significant text, and scripted metric aggregations are not supported. +{: .note} + +For example, the following request creates a simple `terms` aggregation on the `method` derived field: + +```json +POST /logs/_search +{ + "size": 0, + "aggs": { + "methods": { + "terms": { + "field": "method" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the following buckets: + +
+ + Response + + {: .text-delta} + +```json +{ + "took" : 12, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 5, + "relation" : "eq" + }, + "max_score" : null, + "hits" : [ ] + }, + "aggregations" : { + "methods" : { + "doc_count_error_upper_bound" : 0, + "sum_other_doc_count" : 0, + "buckets" : [ + { + "key" : "GET", + "doc_count" : 2 + }, + { + "key" : "POST", + "doc_count" : 2 + }, + { + "key" : "DELETE", + "doc_count" : 1 + } + ] + } + } +} +``` +
+ ## Performance Derived fields are not indexed but are computed dynamically by retrieving values from the `_source` field or doc values. Thus, they run more slowly. To improve performance, try the following: diff --git a/_field-types/supported-field-types/flat-object.md b/_field-types/supported-field-types/flat-object.md index 933c385ac5..c9e59710e1 100644 --- a/_field-types/supported-field-types/flat-object.md +++ b/_field-types/supported-field-types/flat-object.md @@ -10,6 +10,8 @@ redirect_from: --- # Flat object field type +**Introduced 2.7** +{: .label .label-purple } In OpenSearch, you don't have to specify a mapping before indexing documents. If you don't specify a mapping, OpenSearch uses [dynamic mapping]({{site.url}}{{site.baseurl}}/field-types/index#dynamic-mapping) to map every field and its subfields in the document automatically. When you ingest documents such as logs, you may not know every field's subfield name and type in advance. In this case, dynamically mapping all new subfields can quickly lead to a "mapping explosion," where the growing number of fields may degrade the performance of your cluster. diff --git a/_field-types/supported-field-types/geo-point.md b/_field-types/supported-field-types/geo-point.md index 0912dc618d..96586d044f 100644 --- a/_field-types/supported-field-types/geo-point.md +++ b/_field-types/supported-field-types/geo-point.md @@ -11,6 +11,8 @@ redirect_from: --- # Geopoint field type +**Introduced 1.0** +{: .label .label-purple } A geopoint field type contains a geographic point specified by latitude and longitude. diff --git a/_field-types/supported-field-types/geo-shape.md b/_field-types/supported-field-types/geo-shape.md index b7b06a0d04..ee98bfca03 100644 --- a/_field-types/supported-field-types/geo-shape.md +++ b/_field-types/supported-field-types/geo-shape.md @@ -11,6 +11,8 @@ redirect_from: --- # Geoshape field type +**Introduced 1.0** +{: .label .label-purple } A geoshape field type contains a geographic shape, such as a polygon or a collection of geographic points. To index a geoshape, OpenSearch tesselates the shape into a triangular mesh and stores each triangle in a BKD tree. This provides a 10-7decimal degree of precision, which represents near-perfect spatial resolution. Performance of this process is mostly impacted by the number of vertices in a polygon you are indexing. diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 7c7b7375f9..5540e73f4e 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -22,7 +22,7 @@ Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-t [Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/): A date stored in nanoseconds. IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. [Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). -[Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object-fields/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. +[Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object-fields/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent/child relationship between documents in the same index. [String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/): A space-optimized version of a `text` field.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string.
[`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/): A variation of `keyword` with efficient substring and regular expression matching. [Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. [Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. @@ -30,6 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search. Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields. +Star-tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Precomputes aggregations and stores them in a [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index), accelerating the performance of aggregation queries. ## Arrays diff --git a/_field-types/supported-field-types/ip.md b/_field-types/supported-field-types/ip.md index cb2a5569c8..99b41e45cd 100644 --- a/_field-types/supported-field-types/ip.md +++ b/_field-types/supported-field-types/ip.md @@ -10,6 +10,8 @@ redirect_from: --- # IP address field type +**Introduced 1.0** +{: .label .label-purple } An ip field type contains an IP address in IPv4 or IPv6 format. diff --git a/_field-types/supported-field-types/join.md b/_field-types/supported-field-types/join.md index c83705f4c3..fd808a65e7 100644 --- a/_field-types/supported-field-types/join.md +++ b/_field-types/supported-field-types/join.md @@ -11,12 +11,14 @@ redirect_from: --- # Join field type +**Introduced 1.0** +{: .label .label-purple } A join field type establishes a parent/child relationship between documents in the same index. ## Example -Create a mapping to establish a parent-child relationship between products and their brands: +Create a mapping to establish a parent/child relationship between products and their brands: ```json PUT testindex1 @@ -59,7 +61,7 @@ PUT testindex1/_doc/1 ``` {% include copy-curl.html %} -When indexing child documents, you have to specify the `routing` query parameter because parent and child documents in the same relation have to be indexed on the same shard. Each child document refers to its parent's ID in the `parent` field. +When indexing child documents, you need to specify the `routing` query parameter because parent and child documents in the same parent/child hierarchy must be indexed on the same shard. For more information, see [Routing]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/routing/). Each child document refers to its parent's ID in the `parent` field. Index two child documents, one for each parent: @@ -325,3 +327,8 @@ PUT testindex1 - Multiple parents are not supported. - You can add a child document to an existing document only if the existing document is already marked as a parent. - You can add a new relation to an existing join field. + +## Next steps + +- Learn about [joining queries]({{site.url}}{{site.baseurl}}/query-dsl/joining/) on join fields. +- Learn more about [retrieving inner hits]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/inner-hits/). \ No newline at end of file diff --git a/_field-types/supported-field-types/keyword.md b/_field-types/supported-field-types/keyword.md index eea6cc664b..ca9c8085f6 100644 --- a/_field-types/supported-field-types/keyword.md +++ b/_field-types/supported-field-types/keyword.md @@ -11,6 +11,8 @@ redirect_from: --- # Keyword field type +**Introduced 1.0** +{: .label .label-purple } A keyword field type contains a string that is not analyzed. It allows only exact, case-sensitive matches. diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md index a2a7137733..da784aeefe 100644 --- a/_field-types/supported-field-types/knn-vector.md +++ b/_field-types/supported-field-types/knn-vector.md @@ -8,6 +8,8 @@ has_math: true --- # k-NN vector field type +**Introduced 1.0** +{: .label .label-purple } The [k-NN plugin]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/) introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id. @@ -20,8 +22,7 @@ PUT test-index { "settings": { "index": { - "knn": true, - "knn.algo_param.ef_search": 100 + "knn": true } }, "mappings": { @@ -29,14 +30,10 @@ PUT test-index "my_vector": { "type": "knn_vector", "dimension": 3, + "space_type": "l2", "method": { "name": "hnsw", - "space_type": "l2", - "engine": "lucene", - "parameters": { - "ef_construction": 128, - "m": 24 - } + "engine": "faiss" } } } @@ -45,6 +42,92 @@ PUT test-index ``` {% include copy-curl.html %} +## Vector workload modes + +Vector search involves trade-offs between low-latency and low-cost search. Specify the `mode` mapping parameter of the `knn_vector` type to indicate which search mode you want to prioritize. The `mode` dictates the default values for k-NN parameters. You can further fine-tune your index by overriding the default parameter values in the k-NN field mapping. + +The following modes are currently supported. + +| Mode | Default engine | Description | +|:---|:---|:---| +| `in_memory` (Default) | `nmslib` | Prioritizes low-latency search. This mode uses the `nmslib` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch. | +| `on_disk` | `faiss` | Prioritizes low-cost vector search while maintaining strong recall. By default, the `on_disk` mode uses quantization and rescoring to execute a two-pass approach to retrieve the top neighbors. The `on_disk` mode supports only `float` vector types. | + +To create a k-NN index that uses the `on_disk` mode for low-cost search, send the following request: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "mode": "on_disk" + } + } + } +} +``` +{% include copy-curl.html %} + +## Compression levels + +The `compression_level` mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor. The following table lists the available `compression_level` values. + +| Compression level | Supported engines | +|:------------------|:-------------------------------| +| `1x` | `faiss`, `lucene`, and `nmslib` | +| `2x` | `faiss` | +| `4x` | `lucene` | +| `8x` | `faiss` | +| `16x` | `faiss` | +| `32x` | `faiss` | + +For example, if a `compression_level` of `32x` is passed for a `float32` index of 768-dimensional vectors, the per-vector memory is reduced from `4 * 768 = 3072` bytes to `3072 / 32 = 846` bytes. Internally, binary quantization (which maps a `float` to a `bit`) may be used to achieve this compression. + +If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types. +{: .note} + +The following table lists the default `compression_level` values for the available workload modes. + +| Mode | Default compression level | +|:------------------|:-------------------------------| +| `in_memory` | `1x` | +| `on_disk` | `32x` | + + +To create a vector field with a `compression_level` of `16x`, specify the `compression_level` parameter in the mappings. This parameter overrides the default compression level for the `on_disk` mode from `32x` to `16x`, producing higher recall and accuracy at the expense of a larger memory footprint: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "mode": "on_disk", + "compression_level": "16x" + } + } + } +} +``` +{% include copy-curl.html %} + ## Method definitions [Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) are used when the underlying [approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files. @@ -53,13 +136,13 @@ PUT test-index "my_vector": { "type": "knn_vector", "dimension": 4, + "space_type": "l2", "method": { "name": "hnsw", - "space_type": "l2", "engine": "nmslib", "parameters": { - "ef_construction": 128, - "m": 24 + "ef_construction": 100, + "m": 16 } } } @@ -71,6 +154,7 @@ Model IDs are used when the underlying Approximate k-NN algorithm requires a tra model contains the information needed to initialize the native library segment files. ```json +"my_vector": { "type": "knn_vector", "model_id": "my-model" } @@ -78,28 +162,36 @@ model contains the information needed to initialize the native library segment f However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the dimension. ```json +"my_vector": { "type": "knn_vector", "dimension": 128 } ``` -## Lucene byte vector +## Byte vectors -By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. +By default, k-NN vectors are `float` vectors, in which each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` or `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. -Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines. +Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine. {: .note} -In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution). +In [k-NN benchmarking tests](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution). When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. {: .important} - + +When using `byte` vectors with the `faiss` engine, we recommend using [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), which helps to significantly reduce search latencies and improve indexing throughput. +{: .important} + Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`. To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index: - ```json +### Example: HNSW + +The following example creates a byte vector index with the `lucene` engine and `hnsw` algorithm: + +```json PUT test-index { "settings": { @@ -114,13 +206,13 @@ PUT test-index "type": "knn_vector", "dimension": 3, "data_type": "byte", + "space_type": "l2", "method": { "name": "hnsw", - "space_type": "l2", "engine": "lucene", "parameters": { - "ef_construction": 128, - "m": 24 + "ef_construction": 100, + "m": 16 } } } @@ -130,7 +222,7 @@ PUT test-index ``` {% include copy-curl.html %} -Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: +After creating the index, ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: ```json PUT test-index/_doc/1 @@ -166,9 +258,160 @@ GET test-index/_search ``` {% include copy-curl.html %} +### Example: IVF + +The `ivf` method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). + +First, create an index that will contain byte vector training data. Specify the `faiss` engine and `ivf` algorithm and make sure that the `dimension` matches the dimension of the model you want to create: + +```json +PUT train-index +{ + "mappings": { + "properties": { + "train-field": { + "type": "knn_vector", + "dimension": 4, + "data_type": "byte" + } + } + } +} +``` +{% include copy-curl.html %} + +First, ingest training data containing byte vectors into the training index: + +```json +PUT _bulk +{ "index": { "_index": "train-index", "_id": "1" } } +{ "train-field": [127, 100, 0, -120] } +{ "index": { "_index": "train-index", "_id": "2" } } +{ "train-field": [2, -128, -10, 50] } +{ "index": { "_index": "train-index", "_id": "3" } } +{ "train-field": [13, -100, 5, 126] } +{ "index": { "_index": "train-index", "_id": "4" } } +{ "train-field": [5, 100, -6, -125] } +``` +{% include copy-curl.html %} + +Then, create and train the model named `byte-vector-model`. The model will be trained using the training data from the `train-field` in the `train-index`. Specify the `byte` data type: + +```json +POST _plugins/_knn/models/byte-vector-model/_train +{ + "training_index": "train-index", + "training_field": "train-field", + "dimension": 4, + "description": "model with byte data", + "data_type": "byte", + "method": { + "name": "ivf", + "engine": "faiss", + "space_type": "l2", + "parameters": { + "nlist": 1, + "nprobes": 1 + } + } +} +``` +{% include copy-curl.html %} + +To check the model training status, call the Get Model API: + +```json +GET _plugins/_knn/models/byte-vector-model?filter_path=state +``` +{% include copy-curl.html %} + +Once the training is complete, the `state` changes to `created`. + +Next, create an index that will initialize its native library indexes using the trained model: + +```json +PUT test-byte-ivf +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "model_id": "byte-vector-model" + } + } + } +} +``` +{% include copy-curl.html %} + +Ingest the data containing the byte vectors that you want to search into the created index: + +```json +PUT _bulk?refresh=true +{"index": {"_index": "test-byte-ivf", "_id": "1"}} +{"my_vector": [7, 10, 15, -120]} +{"index": {"_index": "test-byte-ivf", "_id": "2"}} +{"my_vector": [10, -100, 120, -108]} +{"index": {"_index": "test-byte-ivf", "_id": "3"}} +{"my_vector": [1, -2, 5, -50]} +{"index": {"_index": "test-byte-ivf", "_id": "4"}} +{"my_vector": [9, -7, 45, -78]} +{"index": {"_index": "test-byte-ivf", "_id": "5"}} +{"my_vector": [80, -70, 127, -128]} +``` +{% include copy-curl.html %} + +Finally, search the data. Be sure to provide a byte vector in the k-NN vector field: + +```json +GET test-byte-ivf/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [100, -120, 50, -45], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +### Memory estimation + +In the best-case scenario, byte vectors require 25% of the memory required by 32-bit vectors. + +#### HNSW memory estimation + +The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The memory requirement can be estimated as follows: + +```r +1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.39 GB +``` + +#### IVF memory estimation + +The memory required for IVF is estimated to be `1.1 * ((dimension * num_vectors) + (4 * nlist * dimension))` bytes/vector, where `nlist` is the number of buckets to partition vectors into. + +As an example, assume that you have 1 million vectors with a dimension of 256 and an `nlist` of 128. The memory requirement can be estimated as follows: + +```r +1.1 * ((256 * 1,000,000) + (4 * 128 * 256)) ~= 0.27 GB +``` + + ### Quantization techniques -If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. +If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. #### Scalar quantization for the L2 space type @@ -267,7 +510,7 @@ return Byte(bval) ``` {% include copy.html %} -## Binary k-NN vectors +## Binary vectors You can reduce memory costs by a factor of 32 by switching from float to binary vectors. Using binary vector indexes can lower operational costs while maintaining high recall performance, making large-scale deployment more economical and efficient. @@ -305,14 +548,10 @@ PUT /test-binary-hnsw "type": "knn_vector", "dimension": 8, "data_type": "binary", + "space_type": "hamming", "method": { "name": "hnsw", - "space_type": "hamming", - "engine": "faiss", - "parameters": { - "ef_construction": 128, - "m": 24 - } + "engine": "faiss" } } } @@ -535,12 +774,12 @@ POST _plugins/_knn/models/test-binary-model/_train "dimension": 8, "description": "model with binary data", "data_type": "binary", + "space_type": "hamming", "method": { "name": "ivf", "engine": "faiss", - "space_type": "hamming", "parameters": { - "nlist": 1, + "nlist": 16, "nprobes": 1 } } diff --git a/_field-types/supported-field-types/match-only-text.md b/_field-types/supported-field-types/match-only-text.md index fd2c6b5850..534275bd3a 100644 --- a/_field-types/supported-field-types/match-only-text.md +++ b/_field-types/supported-field-types/match-only-text.md @@ -8,6 +8,8 @@ grand_parent: Supported field types --- # Match-only text field type +**Introduced 2.12** +{: .label .label-purple } A `match_only_text` field is a variant of a `text` field designed for full-text search when scoring and positional information of terms within a document are not critical. diff --git a/_field-types/supported-field-types/nested.md b/_field-types/supported-field-types/nested.md index 90d09177d1..4db270c1dc 100644 --- a/_field-types/supported-field-types/nested.md +++ b/_field-types/supported-field-types/nested.md @@ -11,6 +11,8 @@ redirect_from: --- # Nested field type +**Introduced 1.0** +{: .label .label-purple } A nested field type is a special type of [object field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/). @@ -312,3 +314,8 @@ Parameter | Description `include_in_parent` | A Boolean value that specifies whether all fields in the child nested object should also be added to the parent document in flattened form. Default is `false`. `include_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. `properties` | Fields of this object, which can be of any supported type. New properties can be dynamically added to this object if `dynamic` is set to `true`. + +## Next steps + +- Learn about [joining queries]({{site.url}}{{site.baseurl}}/query-dsl/joining/) on nested fields. +- Learn about [retrieving inner hits]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/inner-hits/). \ No newline at end of file diff --git a/_field-types/supported-field-types/object-fields.md b/_field-types/supported-field-types/object-fields.md index 429c5b94c7..e683e70f0d 100644 --- a/_field-types/supported-field-types/object-fields.md +++ b/_field-types/supported-field-types/object-fields.md @@ -19,5 +19,5 @@ Field data type | Description [`object`]({{site.url}}{{site.baseurl}}/field-types/object/) | A JSON object. [`nested`]({{site.url}}{{site.baseurl}}/field-types/nested/) | Used when objects in an array need to be indexed independently as separate documents. [`flat_object`]({{site.url}}{{site.baseurl}}/field-types/flat-object/) | A JSON object treated as a string. -[`join`]({{site.url}}{{site.baseurl}}/field-types/join/) | Establishes a parent-child relationship between documents in the same index. +[`join`]({{site.url}}{{site.baseurl}}/field-types/join/) | Establishes a parent/child relationship between documents in the same index. diff --git a/_field-types/supported-field-types/object.md b/_field-types/supported-field-types/object.md index db539a9608..ee50f5af9d 100644 --- a/_field-types/supported-field-types/object.md +++ b/_field-types/supported-field-types/object.md @@ -11,6 +11,8 @@ redirect_from: --- # Object field type +**Introduced 1.0** +{: .label .label-purple } An object field type contains a JSON object (a set of name/value pairs). A value in a JSON object may be another JSON object. It is not necessary to specify `object` as the type when mapping object fields because `object` is the default type. diff --git a/_field-types/supported-field-types/percolator.md b/_field-types/supported-field-types/percolator.md index 92325b6127..2b067cf595 100644 --- a/_field-types/supported-field-types/percolator.md +++ b/_field-types/supported-field-types/percolator.md @@ -10,6 +10,8 @@ redirect_from: --- # Percolator field type +**Introduced 1.0** +{: .label .label-purple } A percolator field type specifies to treat this field as a query. Any JSON object field can be marked as a percolator field. Normally, documents are indexed and searches are run against them. When you use a percolator field, you store a search, and later the percolate query matches documents to that search. diff --git a/_field-types/supported-field-types/range.md b/_field-types/supported-field-types/range.md index 22ae1d619e..1001bae584 100644 --- a/_field-types/supported-field-types/range.md +++ b/_field-types/supported-field-types/range.md @@ -10,6 +10,8 @@ redirect_from: --- # Range field types +**Introduced 1.0** +{: .label .label-purple } The following table lists all range field types that OpenSearch supports. diff --git a/_field-types/supported-field-types/rank.md b/_field-types/supported-field-types/rank.md index a4ec0fac4c..f57c540cf5 100644 --- a/_field-types/supported-field-types/rank.md +++ b/_field-types/supported-field-types/rank.md @@ -10,6 +10,8 @@ redirect_from: --- # Rank field types +**Introduced 1.0** +{: .label .label-purple } The following table lists all rank field types that OpenSearch supports. diff --git a/_field-types/supported-field-types/search-as-you-type.md b/_field-types/supported-field-types/search-as-you-type.md index b9141e6b8e..55774d432a 100644 --- a/_field-types/supported-field-types/search-as-you-type.md +++ b/_field-types/supported-field-types/search-as-you-type.md @@ -11,6 +11,8 @@ redirect_from: --- # Search-as-you-type field type +**Introduced 1.0** +{: .label .label-purple } A search-as-you-type field type provides search-as-you-type functionality using both prefix and infix completion. diff --git a/_field-types/supported-field-types/star-tree.md b/_field-types/supported-field-types/star-tree.md new file mode 100644 index 0000000000..2bfccb6632 --- /dev/null +++ b/_field-types/supported-field-types/star-tree.md @@ -0,0 +1,184 @@ +--- +layout: default +title: Star-tree +nav_order: 61 +parent: Supported field types +--- + +# Star-tree field type + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). +{: .warning} + +A star-tree index precomputes aggregations, accelerating the performance of aggregation queries. +If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time. + +OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters. + +For more information, see [Star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/). + +## Prerequisites + +To use a star-tree index, follow the instructions in [Enabling a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index#enabling-a-star-tree-index). + +## Examples + +The following examples show how to use a star-tree index. + +### Star-tree index mappings + +Define star-tree index mappings in the `composite` section in `mappings`. + +The following example API request creates a corresponding star-tree index named`request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings: + +```json +PUT logs +{ + "settings": { + "index.number_of_shards": 1, + "index.number_of_replicas": 0, + "index.composite_index": true + }, + "mappings": { + "composite": { + "request_aggs": { + "type": "star_tree", + "config": { + "max_leaf_docs": 10000, + "skip_star_node_creation_for_dimensions": [ + "port" + ], + "ordered_dimensions": [ + { + "name": "status" + }, + { + "name": "port" + } + ], + "metrics": [ + { + "name": "request_size", + "stats": [ + "sum", + "value_count", + "min", + "max" + ] + }, + { + "name": "latency", + "stats": [ + "sum", + "value_count", + "min", + "max" + ] + } + ] + } + } + }, + "properties": { + "status": { + "type": "integer" + }, + "port": { + "type": "integer" + }, + "request_size": { + "type": "integer" + }, + "latency": { + "type": "scaled_float", + "scaling_factor": 10 + } + } + } +} +``` + +## Star-tree index configuration options + +You can customize your star-tree implementation using the following `config` options in the `mappings` section. These options cannot be modified without reindexing. + +| Parameter | Description | +| :--- | :--- | +| `ordered_dimensions` | A [list of fields](#ordered-dimensions) based on which metrics will be aggregated in a star-tree index. Required. | +| `metrics` | A [list of metric](#metrics) fields required in order to perform aggregations. Required. | +| `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, child nodes will be created based on the unique value of the next field in the `ordered_dimension` (if any). Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). | +| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip star node creation. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). | + + +### Ordered dimensions + +The `ordered_dimensions` parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the `ordered_dimensions`. + +When using the `ordered_dimesions` parameter, follow these best practices: + +- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. +- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance. +- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231). +- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232). +- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index. + +The `ordered_dimensions` parameter supports the following property. + +| Parameter | Required/Optional | Description | +| :--- | :--- | :--- | +| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. | + + +### Metrics + +Configure any metric fields on which you need to perform aggregations. `Metrics` are required as part of a star-tree index configuration. + +When using `metrics`, follow these best practices: + +- Currently, fields supported by `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231). +- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`. + - `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed when a query is run. The remaining base metrics are indexed. +- A maximum of `100` base metrics are supported per star-tree index. + +If `Min`, `Max`, `Sum`, and `Value_count` are defined as `metrics` for each field, then up to 25 such fields can be configured, as shown in the following example: + +```json +{ + "metrics": [ + { + "name": "field1", + "stats": [ + "sum", + "value_count", + "min", + "max" + ], + ..., + ..., + "name": "field25", + "stats": [ + "sum", + "value_count", + "min", + "max" + ] + } + ] +} +``` + + +#### Properties + +The `metrics` parameter supports the following properties. + +| Parameter | Required/Optional | Description | +| :--- | :--- | :--- | +| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. | +| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.
Default is `Sum` and `Value_count`.
`Avg` is a derived metric statistic that will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`. + + +## Supported queries and aggregations + +For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations). + diff --git a/_field-types/supported-field-types/text.md b/_field-types/supported-field-types/text.md index 16350c0cb3..b06bec2187 100644 --- a/_field-types/supported-field-types/text.md +++ b/_field-types/supported-field-types/text.md @@ -11,6 +11,8 @@ redirect_from: --- # Text field type +**Introduced 1.0** +{: .label .label-purple } A `text` field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches for multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, have stopwords removed, have synonyms applied, and so on. diff --git a/_field-types/supported-field-types/token-count.md b/_field-types/supported-field-types/token-count.md index 6c3445e6a7..11eeff7854 100644 --- a/_field-types/supported-field-types/token-count.md +++ b/_field-types/supported-field-types/token-count.md @@ -11,6 +11,8 @@ redirect_from: --- # Token count field type +**Introduced 1.0** +{: .label .label-purple } A token count field type stores the number of analyzed tokens in a string. diff --git a/_field-types/supported-field-types/unsigned-long.md b/_field-types/supported-field-types/unsigned-long.md index dde8d25dee..4c38cb3090 100644 --- a/_field-types/supported-field-types/unsigned-long.md +++ b/_field-types/supported-field-types/unsigned-long.md @@ -8,6 +8,8 @@ has_children: false --- # Unsigned long field type +**Introduced 2.8** +{: .label .label-purple } The `unsigned_long` field type is a numeric field type that represents an unsigned 64-bit integer with a minimum value of 0 and a maximum value of 264 − 1. In the following example, `counter` is mapped as an `unsigned_long` field: diff --git a/_field-types/supported-field-types/wildcard.md b/_field-types/supported-field-types/wildcard.md index c438f35c62..0f8c176135 100644 --- a/_field-types/supported-field-types/wildcard.md +++ b/_field-types/supported-field-types/wildcard.md @@ -8,6 +8,8 @@ grand_parent: Supported field types --- # Wildcard field type +**Introduced 2.15** +{: .label .label-purple } A `wildcard` field is a variant of a `keyword` field designed for arbitrary substring and regular expression matching. diff --git a/_field-types/supported-field-types/xy-point.md b/_field-types/supported-field-types/xy-point.md index 57b6f64758..0d066b9f09 100644 --- a/_field-types/supported-field-types/xy-point.md +++ b/_field-types/supported-field-types/xy-point.md @@ -11,6 +11,8 @@ redirect_from: --- # xy point field type +**Introduced 2.4** +{: .label .label-purple } An xy point field type contains a point in a two-dimensional Cartesian coordinate system, specified by x and y coordinates. It is based on the Lucene [XYPoint](https://lucene.apache.org/core/9_3_0/core/org/apache/lucene/geo/XYPoint.html) field type. The xy point field type is similar to the [geopoint]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) field type, but does not have the range limitations of geopoint. The coordinates of an xy point are single-precision floating-point values. For information about the range and precision of floating-point values, see [Numeric field types]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/). diff --git a/_field-types/supported-field-types/xy-shape.md b/_field-types/supported-field-types/xy-shape.md index f1c7191240..9dcbafceae 100644 --- a/_field-types/supported-field-types/xy-shape.md +++ b/_field-types/supported-field-types/xy-shape.md @@ -11,6 +11,8 @@ redirect_from: --- # xy shape field type +**Introduced 2.4** +{: .label .label-purple } An xy shape field type contains a shape, such as a polygon or a collection of xy points. It is based on the Lucene [XYShape](https://lucene.apache.org/core/9_3_0/core/org/apache/lucene/document/XYShape.html) field type. To index an xy shape, OpenSearch tessellates the shape into a triangular mesh and stores each triangle in a BKD tree (a set of balanced k-dimensional trees). This provides a 10-7decimal degree of precision, which represents near-perfect spatial resolution. diff --git a/_getting-started/communicate.md b/_getting-started/communicate.md index 9960f63b2c..3472270c30 100644 --- a/_getting-started/communicate.md +++ b/_getting-started/communicate.md @@ -200,7 +200,7 @@ PUT /students/_doc/1 "address": "123 Main St." } ``` -{% include copy.html %} +{% include copy-curl.html %} Alternatively, you can update parts of a document by calling the Update Document API: diff --git a/_getting-started/ingest-data.md b/_getting-started/ingest-data.md index 73cf1502f7..866a88e68a 100644 --- a/_getting-started/ingest-data.md +++ b/_getting-started/ingest-data.md @@ -50,32 +50,32 @@ Use the following steps to create a sample index and define field mappings for t ``` {% include copy.html %} -1. Download [ecommerce.json](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json). This file contains the index data formatted so that it can be ingested by the Bulk API: +1. Download [ecommerce.ndjson](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.ndjson). This file contains the index data formatted so that it can be ingested by the Bulk API: To use cURL, send the following request: ```bash - curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json + curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.ndjson ``` {% include copy.html %} To use wget, send the following request: ``` - wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json + wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.ndjson ``` {% include copy.html %} 1. Define the field mappings provided in the mapping file: ```bash - curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce" -ku admin: --data-binary "@ecommerce-field_mappings.json" + curl -H "Content-Type: application/json" -X PUT "https://localhost:9200/ecommerce" -ku admin: --data-binary "@ecommerce-field_mappings.json" ``` {% include copy.html %} 1. Upload the documents using the Bulk API: ```bash - curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce/_bulk" -ku admin: --data-binary "@ecommerce.json" + curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce/_bulk" -ku admin: --data-binary "@ecommerce.ndjson" ``` {% include copy.html %} diff --git a/_getting-started/intro.md b/_getting-started/intro.md index edd178a23f..f5eb24ba2b 100644 --- a/_getting-started/intro.md +++ b/_getting-started/intro.md @@ -56,6 +56,7 @@ ID | Name | GPA | Graduation year 1 | John Doe | 3.89 | 2022 2 | Jonathan Powers | 3.85 | 2025 3 | Jane Doe | 3.52 | 2024 +... | | | ## Clusters and nodes diff --git a/_getting-started/search-data.md b/_getting-started/search-data.md index c6970e7e7b..8e4169fbae 100644 --- a/_getting-started/search-data.md +++ b/_getting-started/search-data.md @@ -104,7 +104,7 @@ OpenSearch returns the matching documents: } ``` -## Response fields +## Response body fields The preceding response contains the following fields. diff --git a/_im-plugin/index-context.md b/_im-plugin/index-context.md new file mode 100644 index 0000000000..be0dbd527d --- /dev/null +++ b/_im-plugin/index-context.md @@ -0,0 +1,175 @@ +--- +layout: default +title: Index context +nav_order: 14 +redirect_from: + - /opensearch/index-context/ +--- + +# Index context + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). +{: .warning} + +Index context declares the use case for an index. Using the context information, OpenSearch applies a predetermined set of settings and mappings, which provides the following benefits: + +- Optimized performance +- Settings tuned to your specific use case +- Accurate mappings and aliases based on [OpenSearch Integrations]({{site.url}}{{site.baseurl}}/integrations/) + +The settings and metadata configuration that are applied using component templates are automatically loaded when your cluster starts. Component templates that start with `@abc_template@` or Application-Based Configuration (ABC) templates can only be used through a `context` object declaration, in order to prevent configuration issues. +{: .warning} + + +## Installation + +To install the index context feature: + +1. Install the `opensearch-system-templates` plugin on all nodes in your cluster using one of the [installation methods]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#install). + +2. Set the feature flag `opensearch.experimental.feature.application_templates.enabled` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). + +3. Set the `cluster.application_templates.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). + +## Using the `context` setting + +Use the `context` setting with the Index API to add use-case-specific context. + +### Considerations + +Consider the following when using the `context` parameter during index creation: + +1. If you use the `context` parameter to create an index, you cannot include any settings declared in the index context during index creation or dynamic settings updates. +2. The index context becomes permanent when set on an index or index template. + +When you adhere to these limitations, suggested configurations or mappings are uniformly applied on indexed data within the specified context. + +### Examples + +The following examples show how to use index context. + + +#### Create an index + +The following example request creates an index in which to store metric data by declaring a `metrics` mapping as the context: + +```json +PUT /my-metrics-index +{ + "context": { + "name": "metrics" + } +} +``` +{% include copy-curl.html %} + +After creation, the context is added to the index and the corresponding settings are applied: + + +**GET request** + +```json +GET /my-metrics-index +``` +{% include copy-curl.html %} + + +**Response** + +```json +{ + "my-metrics-index": { + "aliases": {}, + "mappings": {}, + "settings": { + "index": { + "codec": "zstd_no_dict", + "refresh_interval": "60s", + "number_of_shards": "1", + "provided_name": "my-metrics-index", + "merge": { + "policy": "log_byte_size" + }, + "context": { + "created_version": "1", + "current_version": "1" + }, + ... + } + }, + "context": { + "name": "metrics", + "version": "_latest" + } + } +} +``` + + +#### Create an index template + +You can also use the `context` parameter when creating an index template. The following example request creates an index template with the context information as `logs`: + +```json +PUT _index_template/my-logs +{ + "context": { + "name": "logs", + "version": "1" + }, + "index_patterns": [ + "my-logs-*" + ] +} +``` +{% include copy-curl.html %} + +All indexes created using this index template will get the metadata provided by the associated component template. The following request and response show how `context` is added to the template: + +**Get index template** + +```json +GET _index_template/my-logs +``` +{% include copy-curl.html %} + +**Response** + +```json +{ + "index_templates": [ + { + "name": "my-logs2", + "index_template": { + "index_patterns": [ + "my-logs1-*" + ], + "context": { + "name": "logs", + "version": "1" + } + } + } + ] +} +``` + +If there is any conflict between any settings, mappings, or aliases directly declared by your template and the backing component template for the context, the latter gets higher priority during index creation. + + +## Available context templates + +The following templates are available to be used through the `context` parameter as of OpenSearch 2.17: + +- `logs` +- `metrics` +- `nginx-logs` +- `amazon-cloudtrail-logs` +- `amazon-elb-logs` +- `amazon-s3-logs` +- `apache-web-logs` +- `k8s-logs` + +For more information about these templates, see the [OpenSearch system templates repository](https://github.com/opensearch-project/opensearch-system-templates/tree/main/src/main/resources/org/opensearch/system/applicationtemplates/v1). + +To view the current version of these templates on your cluster, use `GET /_component_template`. diff --git a/_im-plugin/ism/policies.md b/_im-plugin/ism/policies.md index e6262e883b..27c37e67ea 100644 --- a/_im-plugin/ism/policies.md +++ b/_im-plugin/ism/policies.md @@ -539,7 +539,7 @@ GET _plugins/_rollup/jobs//_explain } ```` -#### Request fields +#### Request body fields Request fields are required when creating an ISM policy. You can reference the [Index rollups API]({{site.url}}{{site.baseurl}}/im-plugin/index-rollups/rollup-api/#create-or-update-an-index-rollup-job) page for request field options. diff --git a/_includes/header.html b/_includes/header.html index b7dce4c317..20d82c451e 100644 --- a/_includes/header.html +++ b/_includes/header.html @@ -1,3 +1,71 @@ +{% assign url_parts = page.url | split: "/" %} +{% if url_parts.size > 0 %} + + {% assign last_url_part = url_parts | last %} + + {% comment %} Does the URL contain a filename, and is it an index.html or not? {% endcomment %} + {% if last_url_part contains ".html" %} + {% assign url_has_filename = true %} + {% if last_url_part == 'index.html' %} + {% assign url_filename_is_index = true %} + {% else %} + {% assign url_filename_is_index = false %} + {% endif %} + {% else %} + {% assign url_has_filename = false %} + {% endif %} + + {% comment %} + OpenSearchCon URLs require some special consideration, because it's a specialization + of the /events URL which is itself a child of Community; te OpenSearchCon menu is NOT + a child of Community. + {% endcomment %} + {% if page.url contains "opensearchcon" %} + {% assign is_conference_page = true %} + {% else %} + {% assign is_conference_page = false %} + {% endif %} + + {% if is_conference_page %} + {% comment %} + If the page is a confernce page and it has a filename then its the penultimate + path component that has the child menu item of the OpenSearchCon that needs + to be marked as in-category. If there's no filename then reference the ultimate + path component. + Unless the filename is opensearchcon2023-cfp, because it's a one off that is not + within the /events/opensearchcon/... structure. + {% endcomment %} + {% if url_has_filename %} + {% unless page.url contains 'opensearchcon2023-cfp' %} + {% assign url_fragment_index = url_parts | size | minus: 2 %} + {% assign url_fragment = url_parts[url_fragment_index] %} + {% else %} + {% assign url_fragment = 'opensearchcon2023-cfp' %} + {% endunless %} + {% else %} + {% assign url_fragment = last_url_part %} + {% endif %} + {% else %} + {% comment %} + If the page is NOT a conference page, the URL has a filename, and the filename + is NOT index.html then refer to the filename without the .html extension. + If the filename is index.html then refer to the penultimate path component. + If there is not filename then refer to the ultimate path component. + {% endcomment %} + {% if url_has_filename %} + {% unless url_filename_is_index %} + {% assign url_fragment = last_url_part | replace: '.html', '' %} + {% else %} + {% assign url_fragment_index = url_parts | size | minus: 2 %} + {% assign url_fragment = url_parts[url_fragment_index] %} + {% endunless %} + {% else %} + {% assign url_fragment = last_url_part %} + {% endif %} + {% endif %} +{% else %} + {% assign url_fragment = '' %} +{% endif %} {% if page.alert %}