diff --git a/_field-types/index.md b/_field-types/index.md index 7a7e816ada..e9250f409d 100644 --- a/_field-types/index.md +++ b/_field-types/index.md @@ -12,43 +12,77 @@ redirect_from: # Mappings and field types -You can define how documents and their fields are stored and indexed by creating a _mapping_. The mapping specifies the list of fields for a document. Every field in the document has a _field type_, which defines the type of data the field contains. For example, you may want to specify that the `year` field should be of type `date`. To learn more, see [Supported field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/). +Mappings tell OpenSearch how to store and index your documents and their fields. You can specify the data type for each field (for example, `year` as `date`) to make storage and querying more efficient. -If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. +While [dynamic mappings](#dynamic-mapping) automatically add new data and fields, using explicit mappings is recommended. Explicit mappings let you define the exact structure and data types upfront. This helps to maintain data consistency and optimize performance, especially for large datasets or high-volume indexing operations. -For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. By using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers. +For example, with explicit mappings, you can ensure that `year` is treated as text and `age` as an integer instead of both being interpreted as integers by dynamic mapping. -This section provides an example for how to create an index mapping and how to add a document to it that will get ip_range validated. - -#### Table of contents -1. TOC -{:toc} - - ---- ## Dynamic mapping When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping. -#### Dynamic mapping types +### Dynamic mapping types Type | Description :--- | :--- -null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values. -boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.` -float | A single-precision 32-bit floating point number. -double | A double-precision 64-bit floating point number. -integer | A signed 32-bit number. -object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. -array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values. -text | A string sequence of characters that represent full-text values. -keyword | A string sequence of structured characters, such as an email address or ZIP code. +`null` | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if the field has no value. +`boolean` | OpenSearch accepts `true` and `false` as Boolean values. An empty string is equal to `false.` +`float` | A single-precision, 32-bit floating-point number. +`double` | A double-precision, 64-bit floating-point number. +`integer` | A signed 32-bit number. +`object` | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. +`array` | OpenSearch does not have a specific array data type. Arrays are represented as a set of values of the same data type (for example, integers or strings) associated with a field. When indexing, you can pass multiple values for a field, and OpenSearch will treat it as an array. Empty arrays are valid and recognized as array fields with zero elements---not as fields with no values. OpenSearch supports querying and filtering arrays, including checking for values, range queries, and array operations like concatenation and intersection. Nested arrays, which may contain complex objects or other arrays, can also be used for advanced data modeling. +`text` | A string sequence of characters that represent full-text values. +`keyword` | A string sequence of structured characters, such as an email address or ZIP code. date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date. numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled. +### Dynamic templates + +Dynamic templates are used to define custom mappings for dynamically added fields based on the data type, field name, or field path. They allow you to define a flexible schema for your data that can automatically adapt to changes in the structure or format of the input data. + +You can use the following syntax to define a dynamic mapping template: + +```json +PUT index +{ + "mappings": { + "dynamic_templates": [ + { + "fields": { + "mapping": { + "type": "short" + }, + "match_mapping_type": "string", + "path_match": "status*" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +This mapping configuration dynamically maps any field with a name starting with `status` (for example, `status_code`) to the `short` data type if the initial value provided during indexing is a string. + +### Dynamic mapping parameters + +The `dynamic_templates` support the following parameters for matching conditions and mapping rules. The default value is `null`. + +Parameter | Description | +----------|-------------| +`match_mapping_type` | Specifies the JSON data type (for example, string, long, double, object, binary, Boolean, date) that triggers the mapping. +`match` | A regular expression used to match field names and apply the mapping. +`unmatch` | A regular expression used to exclude field names from the mapping. +`match_pattern` | Determines the pattern matching behavior, either `regex` or `simple`. Default is `simple`. +`path_match` | Allows you to match nested field paths using a regular expression. +`path_unmatch` | Excludes nested field paths from the mapping using a regular expression. +`mapping` | The mapping configuration to apply. + ## Explicit mapping -If you know exactly what your field data types need to be, you can specify them in your request body when creating your index. +If you know exactly which field data types you need to use, then you can specify them in your request body when creating your index, as shown in the following example request: ```json PUT sample-index1 @@ -62,8 +96,9 @@ PUT sample-index1 } } ``` +{% include copy-curl.html %} -### Response +#### Response ```json { "acknowledged": true, @@ -71,8 +106,9 @@ PUT sample-index1 "index": "sample-index1" } ``` +{% include copy-curl.html %} -To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method: +To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method, as shown in the following example request: ```json POST sample-index1/_mapping @@ -84,84 +120,29 @@ POST sample-index1/_mapping } } ``` +{% include copy-curl.html %} You cannot change the mapping of an existing field, you can only modify the field's mapping parameters. {: .note} ---- -## Mapping example usage +## Mapping parameters -The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You accomplish this by setting the `ignore_malformed` parameter to `true`. +Mapping parameters are used to configure the behavior of index fields. See [Mappings and field types]({{site.url}}{{site.baseurl}}/field-types/) for more information. -### Create an index with an `ip` mapping +## Mapping limit settings -To create an index, use a PUT request: +OpenSearch has certain mapping limits and settings, such as the settings listed in the following table. Settings can be configured based on your requirements. -```json -PUT /test-index -{ - "mappings" : { - "properties" : { - "ip_address" : { - "type" : "ip", - "ignore_malformed": true - } - } - } -} -``` - -You can add a document that has a malformed IP address to your index: - -```json -PUT /test-index/_doc/1 -{ - "ip_address" : "malformed ip address" -} -``` - -This indexed IP address does not throw an error because `ignore_malformed` is set to true. - -You can query the index using the following request: - -```json -GET /test-index/_search -``` +| Setting | Default value | Allowed value | Type | Description | +|-|-|-|-|-| +| `index.mapping.nested_fields.limit` | 50 | [0,) | Dynamic | Limits the maximum number of nested fields that can be defined in an index mapping. | +| `index.mapping.nested_objects.limit` | 10,000 | [0,) | Dynamic | Limits the maximum number of nested objects that can be created in a single document. | +| `index.mapping.total_fields.limit` | 1,000 | [0,) | Dynamic | Limits the maximum number of fields that can be defined in an index mapping. | +| `index.mapping.depth.limit` | 20 | [1,100] | Dynamic | Limits the maximum depth of nested objects and nested fields that can be defined in an index mapping. | +| `index.mapping.field_name_length.limit` | 50,000 | [1,50000] | Dynamic | Limits the maximum length of field names that can be defined in an index mapping. | +| `index.mapper.dynamic` | true | {true,false} | Dynamic | Determines whether new fields should be dynamically added to a mapping. | -The response shows that the `ip_address` field is ignored in the indexed document: - -```json -{ - "took": 14, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "test-index", - "_id": "1", - "_score": 1, - "_ignored": [ - "ip_address" - ], - "_source": { - "ip_address": "malformed ip address" - } - } - ] - } -} -``` +--- ## Get a mapping @@ -170,14 +151,16 @@ To get all mappings for one or more indexes, use the following request: ```json GET /_mapping ``` +{% include copy-curl.html %} -In the above request, `` may be an index name or a comma-separated list of index names. +In the previous request, `` may be an index name or a comma-separated list of index names. To get all mappings for all indexes, use the following request: ```json GET _mapping ``` +{% include copy-curl.html %} To get a mapping for a specific field, provide the index name and the field name: @@ -185,14 +168,14 @@ To get a mapping for a specific field, provide the index name and the field name GET _mapping/field/ GET //_mapping/field/ ``` +{% include copy-curl.html %} -Both `` and `` can be specified as one value or a comma-separated list. - -For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`: +Both `` and `` can be specified as either one value or a comma-separated list. For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`: ```json GET sample-index1/_mapping/field/year,age ``` +{% include copy-curl.html %} The response contains the specified fields: @@ -220,3 +203,8 @@ The response contains the specified fields: } } ``` +{% include copy-curl.html %} + +## Mappings use cases + +See [Mappings use cases]({{site.url}}{{site.baseurl}}/field-types/mappings-use-cases/) for use case examples, including examples of mapping string fields and ignoring malformed IP addresses. diff --git a/_field-types/mappings-use-cases.md b/_field-types/mappings-use-cases.md new file mode 100644 index 0000000000..835e030bab --- /dev/null +++ b/_field-types/mappings-use-cases.md @@ -0,0 +1,122 @@ +--- +layout: default +title: Mappings use cases +parent: Mappings and fields types +nav_order: 5 +nav_exclude: true +--- + +# Mappings use cases + +Mappings provide control over how data is indexed and queried, enabling optimized performance and efficient storage for a range of use cases. + +--- + +## Example: Ignoring malformed IP addresses + +The following example shows you how to create a mapping specifying that OpenSearch should ignore any documents containing malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You can accomplish this by setting the `ignore_malformed` parameter to `true`. + +### Create an index with an `ip` mapping + +To create an index with an `ip` mapping, use a PUT request: + +```json +PUT /test-index +{ + "mappings" : { + "properties" : { + "ip_address" : { + "type" : "ip", + "ignore_malformed": true + } + } + } +} +``` +{% include copy-curl.html %} + +Then add a document with a malformed IP address: + +```json +PUT /test-index/_doc/1 +{ + "ip_address" : "malformed ip address" +} +``` +{% include copy-curl.html %} + +When you query the index, the `ip_address` field will be ignored. You can query the index using the following request: + +```json +GET /test-index/_search +``` +{% include copy-curl.html %} + +#### Response + +```json +{ + "took": 14, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-index", + "_id": "1", + "_score": 1, + "_ignored": [ + "ip_address" + ], + "_source": { + "ip_address": "malformed ip address" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +--- + +## Mapping string fields to `text` and `keyword` types + +To create an index named `movies1` with a dynamic template that maps all string fields to both the `text` and `keyword` types, you can use the following request: + +```json +PUT movies1 +{ + "mappings": { + "dynamic_templates": [ + { + "strings": { + "match_mapping_type": "string", + "mapping": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + } + ] + } +} +``` +{% include copy-curl.html %} + +This dynamic template ensures that any string fields in your documents will be indexed as both a full-text `text` type and a `keyword` type. diff --git a/_field-types/metadata-fields/field-names.md b/_field-types/metadata-fields/field-names.md new file mode 100644 index 0000000000..b17e94fbb4 --- /dev/null +++ b/_field-types/metadata-fields/field-names.md @@ -0,0 +1,43 @@ +--- +layout: default +title: Field names +nav_order: 10 +parent: Metadata fields +--- + +# Field names + +The `_field_names` field indexes field names that contain non-null values. This enables the use of the `exists` query, which can identify documents that either have or do not have non-null values for a specified field. + +However, `_field_names` only indexes field names when both `doc_values` and `norms` are disabled. If either `doc_values` or `norms` are enabled, then the `exists` query still functions but will not rely on the `_field_names` field. + +## Mapping example + +```json +{ + "mappings": { + "_field_names": { + "enabled": "true" + }, + "properties": { + }, + "title": { + "type": "text", + "doc_values": false, + "norms": false + }, + "description": { + "type": "text", + "doc_values": true, + "norms": false + }, + "price": { + "type": "float", + "doc_values": false, + "norms": true + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/id.md b/_field-types/metadata-fields/id.md new file mode 100644 index 0000000000..f66f4b8e13 --- /dev/null +++ b/_field-types/metadata-fields/id.md @@ -0,0 +1,86 @@ +--- +layout: default +title: ID +nav_order: 20 +parent: Metadata fields +--- + +# ID + +Each document in OpenSearch has a unique `_id` field. This field is indexed, allowing you to retrieve documents using the GET API or the [`ids` query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/). + +If you do not provide an `_id` value, then OpenSearch automatically generates one for the document. +{: .note} + +The following example request creates an index named `test-index1` and adds two documents with different `_id` values: + +```json +PUT test-index1/_doc/1 +{ + "text": "Document with ID 1" +} + +PUT test-index1/_doc/2?refresh=true +{ + "text": "Document with ID 2" +} +``` +{% include copy-curl.html %} + +You can then query the documents using the `_id` field, as shown in the following example request: + +```json +GET test-index1/_search +{ + "query": { + "terms": { + "_id": ["1", "2"] + } + } +} +``` +{% include copy-curl.html %} + +The response returns both documents with `_id` values of `1` and `2`: + +```json +{ + "took": 10, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-index1", + "_id": "1", + "_score": 1, + "_source": { + "text": "Document with ID 1" + } + }, + { + "_index": "test-index1", + "_id": "2", + "_score": 1, + "_source": { + "text": "Document with ID 2" + } + } + ] + } +``` +{% include copy-curl.html %} + +## Limitations of the `_id` field + +While the `_id` field can be used in various queries, it is restricted from use in aggregations, sorting, and scripting. If you need to sort or aggregate on the `_id` field, it is recommended to duplicate the `_id` content into another field with `doc_values` enabled. Refer to [IDs query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/) for an example. diff --git a/_field-types/metadata-fields/ignored.md b/_field-types/metadata-fields/ignored.md new file mode 100644 index 0000000000..e867cfc754 --- /dev/null +++ b/_field-types/metadata-fields/ignored.md @@ -0,0 +1,147 @@ +--- +layout: default +title: Ignored +nav_order: 25 +parent: Metadata fields +--- + +# Ignored + +The `_ignored` field helps you manage issues related to malformed data in your documents. This field is used to index and store field names that were ignored during the indexing process as a result of the `ignore_malformed` setting being enabled in the [index mapping]({{site.url}}{{site.baseurl}}/field-types/). + +The `_ignored` field allows you to search for and identify documents containing fields that were ignored as well as for the specific field names that were ignored. This can be useful for troubleshooting. + +You can query the `_ignored` field using the `term`, `terms`, and `exists` queries, and the results will be included in the search hits. + +The `_ignored` field is only populated when the `ignore_malformed` setting is enabled in your index mapping. If `ignore_malformed` is set to `false` (the default value), then malformed fields will cause the entire document to be rejected, and the `_ignored` field will not be populated. +{: .note} + +The following example request shows you how to use the `_ignored` field: + +```json +GET _search +{ + "query": { + "exists": { + "field": "_ignored" + } + } +} +``` +{% include copy-curl.html %} + +--- + +#### Example indexing request with the `_ignored` field + +The following example request adds a new document to the `test-ignored` index with `ignore_malformed` set to `true` so that no error is thrown during indexing: + +```json +PUT test-ignored +{ + "mappings": { + "properties": { + "title": { + "type": "text" + }, + "length": { + "type": "long", + "ignore_malformed": true + } + } + } +} + +POST test-ignored/_doc +{ + "title": "correct text", + "length": "not a number" +} + +GET test-ignored/_search +{ + "query": { + "exists": { + "field": "_ignored" + } + } +} +``` +{% include copy-curl.html %} + +#### Example reponse + +```json +{ + "took": 42, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-ignored", + "_id": "qcf0wZABpEYH7Rw9OT7F", + "_score": 1, + "_ignored": [ + "length" + ], + "_source": { + "title": "correct text", + "length": "not a number" + } + } + ] + } +} +``` + +--- + +## Ignoring a specified field + +You can use a `term` query to find documents in which a specific field was ignored, as shown in the following example request: + +```json +GET _search +{ + "query": { + "term": { + "_ignored": "created_at" + } + } +} +``` +{% include copy-curl.html %} + +#### Reponse + +```json +{ + "took": 51, + "timed_out": false, + "_shards": { + "total": 45, + "successful": 45, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + } +} +``` diff --git a/_field-types/metadata-fields/index-metadata.md b/_field-types/metadata-fields/index-metadata.md new file mode 100644 index 0000000000..657f7d62a5 --- /dev/null +++ b/_field-types/metadata-fields/index-metadata.md @@ -0,0 +1,86 @@ +--- +layout: default +title: Index +nav_order: 25 +parent: Metadata fields +--- + +# Index + +When querying across multiple indexes, you may need to filter results based on the index into which a document was indexed. The `index` field matches documents based on their index. + +The following example request creates two indexes, `products` and `customers`, and adds a document to each index: + +```json +PUT products/_doc/1 +{ + "name": "Widget X" +} + +PUT customers/_doc/2 +{ + "name": "John Doe" +} +``` +{% include copy-curl.html %} + +You can then query both indexes and filter the results using the `_index` field, as shown in the following example request: + +```json +GET products,customers/_search +{ + "query": { + "terms": { + "_index": ["products", "customers"] + } + }, + "aggs": { + "index_groups": { + "terms": { + "field": "_index", + "size": 10 + } + } + }, + "sort": [ + { + "_index": { + "order": "desc" + } + } + ], + "script_fields": { + "index_name": { + "script": { + "lang": "painless", + "source": "doc['_index'].value" + } + } + } +} +``` +{% include copy-curl.html %} + +In this example: + +- The `query` section uses a `terms` query to match documents from the `products` and `customers` indexes. +- The `aggs` section performs a `terms` aggregation on the `_index` field, grouping the results by index. +- The `sort` section sorts the results by the `_index` field in ascending order. +- The `script_fields` section adds a new field called `index_name` to the search results containing the `_index` field value for each document. + +## Querying on the `_index` field + +The `_index` field represents the index into which a document was indexed. You can use this field in your queries to filter, aggregate, sort, or retrieve index information for your search results. + +Because the `_index` field is automatically added to every document, you can use it in your queries like any other field. For example, you can use the `terms` query to match documents from multiple indexes. The following example query returns all documents from the `products` and `customers` indexes: + +```json + { + "query": { + "terms": { + "_index": ["products", "customers"] + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/index.md b/_field-types/metadata-fields/index.md new file mode 100644 index 0000000000..cdc079e1e5 --- /dev/null +++ b/_field-types/metadata-fields/index.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Metadata fields +nav_order: 90 +has_children: true +has_toc: false +--- + +# Metadata fields + +OpenSearch provides built-in metadata fields that allow you to access information about the documents in an index. These fields can be used in your queries as needed. + +Metadata field | Description +:--- | :--- +`_field_names` | The document fields with non-empty or non-null values. +`_ignored` | The document fields that were ignored during the indexing process due to the presence of malformed data, as specified by the `ignore_malformed` setting. +`_id` | The unique identifier assigned to each document. +`_index` | The index in which the document is stored. +`_meta` | Stores custom metadata or additional information specific to the application or use case. +`_routing` | Allows you to specify a custom value that determines the shard assignment for a document in an OpenSearch cluster. +`_source` | Contains the original JSON representation of the document data. diff --git a/_field-types/metadata-fields/meta.md b/_field-types/metadata-fields/meta.md new file mode 100644 index 0000000000..220d58f106 --- /dev/null +++ b/_field-types/metadata-fields/meta.md @@ -0,0 +1,87 @@ +--- +layout: default +title: Meta +nav_order: 30 +parent: Metadata fields +--- + +# Meta + +The `_meta` field is a mapping property that allows you to attach custom metadata to your index mappings. This metadata can be used by your application to store information relevant to your use case, such as versioning, ownership, categorization, or auditing. + +## Usage + +You can define the `_meta` field when creating a new index or updating an existing index's mapping, as shown in the following example request: + +```json +PUT my-index +{ + "mappings": { + "_meta": { + "application": "MyApp", + "version": "1.2.3", + "author": "John Doe" + }, + "properties": { + "title": { + "type": "text" + }, + "description": { + "type": "text" + } + } + } +} + +``` +{% include copy-curl.html %} + +In this example, three custom metadata fields are added: `application`, `version`, and `author`. These fields can be used by your application to store any relevant information about the index, such as the application it belongs to, the application version, or the author of the index. + +You can update the `_meta` field using the [Put Mapping API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/put-mapping/) operation, as shown in the following example request: + +```json +PUT my-index/_mapping +{ + "_meta": { + "application": "MyApp", + "version": "1.3.0", + "author": "Jane Smith" + } +} +``` +{% include copy-curl.html %} + +## Retrieving `meta` information + +You can retrieve the `_meta` information for an index using the [Get Mapping API]({{site.url}}{{site.baseurl}}/field-types/#get-a-mapping) operation, as shown in the following example request: + +```json +GET my-index/_mapping +``` +{% include copy-curl.html %} + +The response returns the full index mapping, including the `_meta` field: + +```json +{ + "my-index": { + "mappings": { + "_meta": { + "application": "MyApp", + "version": "1.3.0", + "author": "Jane Smith" + }, + "properties": { + "description": { + "type": "text" + }, + "title": { + "type": "text" + } + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/routing.md b/_field-types/metadata-fields/routing.md new file mode 100644 index 0000000000..9064e20c49 --- /dev/null +++ b/_field-types/metadata-fields/routing.md @@ -0,0 +1,92 @@ +--- +layout: default +title: Routing +nav_order: 35 +parent: Metadata fields +--- + +# Routing + +OpenSearch uses a hashing algorithm to route documents to specific shards in an index. By default, the document's `_id` field is used as the routing value, but you can also specify a custom routing value for each document. + +## Default routing + +The following is the default OpenSearch routing formula. The `_routing` value is the document's `_id`. + +```json +shard_num = hash(_routing) % num_primary_shards +``` + +## Custom routing + +You can specify a custom routing value when indexing a document, as shown in the following example request: + +```json +PUT sample-index1/_doc/1?routing=JohnDoe1 +{ + "title": "This is a document" +} +``` +{% include copy-curl.html %} + +In this example, the document is routed using the value `JohnDoe1` instead of the default `_id`. + +You must provide the same routing value when retrieving, deleting, or updating the document, as shown in the following example request: + +```json +GET sample-index1/_doc/1?routing=JohnDoe1 +``` +{% include copy-curl.html %} + +## Querying by routing + +You can query documents based on their routing value by using the `_routing` field, as shown in the following example. This query only searches the shard(s) associated with the `JohnDoe1` routing value: + +```json +GET sample-index1/_search +{ + "query": { + "terms": { + "_routing": [ "JohnDoe1" ] + } + } +} +``` +{% include copy-curl.html %} + +## Required routing + +You can make custom routing required for all CRUD operations on an index, as shown in the following example request. If you try to index a document without providing a routing value, OpenSearch will throw an exception. + +```json +PUT sample-index2 +{ + "mappings": { + "_routing": { + "required": true + } + } +} +``` +{% include copy-curl.html %} + +## Routing to specific shards + +You can configure an index to route custom values to a subset of shards rather than a single shard. This is done by setting `index.routing_partition_size` at the time of index creation. The formula for calculating the shard is `shard_num = (hash(_routing) + hash(_id)) % routing_partition_size) % num_primary_shards`. + +The following example request routes documents to one of four shards in the index: + +```json +PUT sample-index3 +{ + "settings": { + "index.routing_partition_size": 4 + }, + "mappings": { + "_routing": { + "required": true + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/source.md b/_field-types/metadata-fields/source.md new file mode 100644 index 0000000000..c9e714f43c --- /dev/null +++ b/_field-types/metadata-fields/source.md @@ -0,0 +1,54 @@ +--- +layout: default +title: Source +nav_order: 40 +parent: Metadata fields +--- + +# Source + +The `_source` field contains the original JSON document body that was indexed. While this field is not searchable, it is stored so that the full document can be returned when executing fetch requests, such as `get` and `search`. + +## Disabling the field + +You can disable the `_source` field by setting the `enabled` parameter to `false`, as shown in the following example request: + +```json +PUT sample-index1 +{ + "mappings": { + "_source": { + "enabled": false + } + } +} +``` +{% include copy-curl.html %} + +Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. +{: .warning} + +## Including or excluding fields + +You can selectively control the contents of the `_source` field by using the `includes` and `excludes` parameters. This allows you to prune the stored `_source` field after it is indexed but before it is saved, as shown in the following example request: + +```json +PUT logs +{ + "mappings": { + "_source": { + "includes": [ + "*.count", + "meta.*" + ], + "excludes": [ + "meta.description", + "meta.other.*" + ] + } + } +} +``` +{% include copy-curl.html %} + +These fields are not stored in the `_source`, but you can still search them because the data remains indexed.