From 9bb98143bb20dc38a143cd537d161cbf51c16fd3 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Thu, 14 May 2020 17:23:38 -0400 Subject: [PATCH 1/5] Changes for #52239. --- docs/reference/ingest/processors/kv.asciidoc | 10 +++++---- docs/reference/mapping.asciidoc | 2 +- docs/reference/mapping/types/nested.asciidoc | 23 ++++++++++---------- 3 files changed, 19 insertions(+), 16 deletions(-) diff --git a/docs/reference/ingest/processors/kv.asciidoc b/docs/reference/ingest/processors/kv.asciidoc index 3f4350a3301ed..7d410207896e9 100644 --- a/docs/reference/ingest/processors/kv.asciidoc +++ b/docs/reference/ingest/processors/kv.asciidoc @@ -1,9 +1,8 @@ [[kv-processor]] === KV Processor -This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety. - -For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those automatically by configuring: +This processor helps automatically parse messages (or specific event fields) which are of the `foo=bar` variety. +For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those fields automatically by configuring: [source,js] -------------------------------------------------- @@ -17,8 +16,11 @@ For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED` -------------------------------------------------- // NOTCONSOLE +TIP: Using the KV Processor can result in field names that you cannot control. Consider using the <> datatype instead, which maps an entire object as a single field. +While a flattened object provides only a single field to search on, the object's contents can still be searched using simple queries and aggregations. + [[kv-options]] -.Kv Options +.KV Options [options="header"] |====== | Name | Required | Default | Description diff --git a/docs/reference/mapping.asciidoc b/docs/reference/mapping.asciidoc index 96153f5e2a355..75abb519a772f 100644 --- a/docs/reference/mapping.asciidoc +++ b/docs/reference/mapping.asciidoc @@ -17,7 +17,7 @@ A mapping definition has: <>:: -Meta-fields are used to customize how a document's metadata associated is +Meta-fields are used to customize how a document's associated metadata is treated. Examples of meta-fields include the document's <>, <>, and <> fields. diff --git a/docs/reference/mapping/types/nested.asciidoc b/docs/reference/mapping/types/nested.asciidoc index aa2ae72dd0522..c178c5287bd1b 100644 --- a/docs/reference/mapping/types/nested.asciidoc +++ b/docs/reference/mapping/types/nested.asciidoc @@ -5,14 +5,13 @@ ++++ The `nested` type is a specialised version of the <> datatype -that allows arrays of objects to be indexed in a way that they can be queried +that allows arrays of objects to be indexed in a way that they can be queried independently of each other. ==== How arrays of objects are flattened -Arrays of inner <> do not work the way you may expect. -Lucene has no concept of inner objects, so Elasticsearch flattens object -hierarchies into a simple list of field names and values. For instance, the +Elasticsearch has no concept of inner objects. Therefore, it flattens object +hierarchies into a simple list of field names and values. For instance, consider the following document: [source,console] @@ -35,7 +34,7 @@ PUT my_index/_doc/1 <1> The `user` field is dynamically added as a field of type `object`. -would be transformed internally into a document that looks more like this: +The previous document would be transformed internally into a document that looks more like this: [source,js] -------------------------------------------------- @@ -71,10 +70,15 @@ GET my_index/_search ==== Using `nested` fields for arrays of objects If you need to index arrays of objects and to maintain the independence of -each object in the array, you should use the `nested` datatype instead of the -<> datatype. Internally, nested objects index each object in +each object in the array, use the `nested` datatype instead of the +<> datatype. + +TIP: If you consider creating `nested` objects with two `key` and `value` keyword fields, consider using the <> datatype instead. +Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query. While a flattened object provides only a single field to search on, the object's contents can still be searched using simple queries and aggregations. + +Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be -queried independently of the others, with the <>: +queried independently of the others with the <>: [source,console] -------------------------------------------------- @@ -230,6 +234,3 @@ settings in place to guard against performance problems: Additional background on these settings, including information on their default values, can be found in <>. - - - From 94f92b2a2b8b0ea9249e3a072a70a3414fd23217 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Fri, 15 May 2020 16:14:22 -0400 Subject: [PATCH 2/5] Incorporating review feedback from Julie T. Also single-sourcing nexted options in the Mapping page and referencing them in the Nested page. --- docs/reference/ingest/processors/kv.asciidoc | 3 +- docs/reference/mapping.asciidoc | 42 ++++++++++++-------- docs/reference/mapping/types/nested.asciidoc | 35 ++++++++-------- 3 files changed, 43 insertions(+), 37 deletions(-) diff --git a/docs/reference/ingest/processors/kv.asciidoc b/docs/reference/ingest/processors/kv.asciidoc index 7d410207896e9..e2db150275d13 100644 --- a/docs/reference/ingest/processors/kv.asciidoc +++ b/docs/reference/ingest/processors/kv.asciidoc @@ -16,8 +16,7 @@ For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED` -------------------------------------------------- // NOTCONSOLE -TIP: Using the KV Processor can result in field names that you cannot control. Consider using the <> datatype instead, which maps an entire object as a single field. -While a flattened object provides only a single field to search on, the object's contents can still be searched using simple queries and aggregations. +TIP: Using the KV Processor can result in field names that you cannot control. Consider using the <> datatype instead, which maps an entire object as a single field and allows for simple searches over its contents. [[kv-options]] .KV Options diff --git a/docs/reference/mapping.asciidoc b/docs/reference/mapping.asciidoc index 75abb519a772f..9b0e621995ea2 100644 --- a/docs/reference/mapping.asciidoc +++ b/docs/reference/mapping.asciidoc @@ -58,17 +58,16 @@ via the <> parameter. [float] === Settings to prevent mappings explosion -Defining too many fields in an index is a condition that can lead to a +Defining too many fields in an index can lead to a mapping explosion, which can cause out of memory errors and difficult -situations to recover from. This problem may be more common than expected. -As an example, consider a situation in which every new document inserted -introduces new fields. This is quite common with dynamic mappings. -Every time a document contains new fields, those will end up in the index's -mappings. This isn't worrying for a small amount of data, but it can become a +situations to recover from. + +Consider a situation where every new document inserted +introduces new fields, such as with <>. +Each new field is added to the index mapping, which can become a problem as the mapping grows. -The following settings allow you to limit the number of field mappings that -can be created manually or dynamically, in order to prevent bad documents from -causing a mapping explosion: + +Use the following settings to limit the number of field mappings (created manually or dynamically) and prevent documents from causing a mapping explosion: `index.mapping.total_fields.limit`:: The maximum number of fields in an index. Field and object mappings, as well as @@ -84,26 +83,37 @@ If you increase this setting, we recommend you also increase the <> setting, which limits the maximum number of <> in a query. ==== ++ +[TIP] +==== +If your field mappings contain a large, arbitrary set of keys, consider using the <> datatype. +==== `index.mapping.depth.limit`:: The maximum depth for a field, which is measured as the number of inner objects. For instance, if all fields are defined at the root object level, then the depth is `1`. If there is one object mapping, then the depth is - `2`, etc. The default is `20`. + `2`, etc. Default is `20`. +// tag::nested-fields-limit[] `index.mapping.nested_fields.limit`:: - The maximum number of distinct `nested` mappings in an index, defaults to `50`. + The maximum number of distinct `nested` mappings in an index. The `nested` type should only be used in special cases, when arrays of objects need to be queried independently of each other. To safeguard against poorly designed mappings, this setting + limits the number of unique `nested` types per index. Default is `50`. +// end::nested-fields-limit[] +// tag::nested-objects-limit[] `index.mapping.nested_objects.limit`:: - The maximum number of `nested` JSON objects within a single document across - all nested types, defaults to 10000. + The maximum number of nested JSON objects that a single document can contain across all + `nested` types. This limit helps to prevent out of memory errors when a document contains too many nested + objects. Default is `10000`. +// end::nested-objects-limit[] `index.mapping.field_name_length.limit`:: - Setting for the maximum length of a field name. The default value is - Long.MAX_VALUE (no limit). This setting isn't really something that addresses + Setting for the maximum length of a field name. This setting isn't really something that addresses mappings explosion but might still be useful if you want to limit the field length. It usually shouldn't be necessary to set this setting. The default is okay - unless a user starts to add a huge number of fields with really long names. + unless a user starts to add a huge number of fields with really long names. Default is + `Long.MAX_VALUE` (no limit). [float] == Dynamic mapping diff --git a/docs/reference/mapping/types/nested.asciidoc b/docs/reference/mapping/types/nested.asciidoc index c178c5287bd1b..43926c8d39b30 100644 --- a/docs/reference/mapping/types/nested.asciidoc +++ b/docs/reference/mapping/types/nested.asciidoc @@ -8,6 +8,7 @@ The `nested` type is a specialised version of the <> datatype that allows arrays of objects to be indexed in a way that they can be queried independently of each other. +[[nested-arrays-flattening-objects]] ==== How arrays of objects are flattened Elasticsearch has no concept of inner objects. Therefore, it flattens object @@ -73,9 +74,6 @@ If you need to index arrays of objects and to maintain the independence of each object in the array, use the `nested` datatype instead of the <> datatype. -TIP: If you consider creating `nested` objects with two `key` and `value` keyword fields, consider using the <> datatype instead. -Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query. While a flattened object provides only a single field to search on, the object's contents can still be searched using simple queries and aggregations. - Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others with the <>: @@ -156,6 +154,8 @@ GET my_index/_search <4> `inner_hits` allow us to highlight the matching nested documents. +[[nested-accessing-documents]] +==== Interacting with `nested` documents Nested documents can be: * queried with the <> query. @@ -165,6 +165,9 @@ Nested documents can be: * sorted with <>. * retrieved and highlighted with <>. +TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <> datatype, which maps an entire object as a single field and allows for simple searches over its contents. +Nested documents and queries are typically expensive, so using the `flattened` datatype for this use case is a better option. + [IMPORTANT] ============================================= @@ -211,26 +214,20 @@ document as standard (flat) fields. Defaults to `false`. [float] === Limits on `nested` mappings and objects -As described earlier, each nested object is indexed as a separate document under the hood. -Continuing with the example above, if we indexed a single document containing 100 `user` objects, -then 101 Lucene documents would be created -- one for the parent document, and one for each +As described earlier, each nested object is indexed as a separate document. +Continuing with the previous example, if we indexed a single document containing 100 `user` objects, +then 101 Lucene documents would be created: one for the parent document, and one for each nested object. Because of the expense associated with `nested` mappings, Elasticsearch puts settings in place to guard against performance problems: -`index.mapping.nested_fields.limit`:: +include::{docdir}/mapping.asciidoc[tag=nested-fields-limit] - The `nested` type should only be used in special cases, when arrays of objects need to be - queried independently of each other. To safeguard against poorly designed mappings, this setting - limits the number of unique `nested` types per index. In our example, the `user` mapping would - count as only 1 towards this limit. Defaults to 50. +In the previous example, the `user` mapping would count as only 1 towards this limit. -`index.mapping.nested_objects.limit`:: +include::{docdir}/mapping.asciidoc[tag=nested-objects-limit] - This setting limits the number of nested objects that a single document may contain across all - `nested` types, in order to prevent out of memory errors when a document contains too many nested - objects. To illustrate how the setting works, say we added another `nested` type called `comments` - to our example mapping above. Then for each document, the combined number of `user` and `comment` - objects it contains must be below the limit. Defaults to 10000. +To illustrate how this setting works, consider adding another `nested` type called `comments` +to the previous example mapping. For each document, the combined number of `user` and `comment` +objects it contains must be below the limit. -Additional background on these settings, including information on their default values, can be found -in <>. +See <> regarding additional settings for preventing mappings explosion. From f5a569358456582a189c6cf2b3d6692e86e35e12 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Fri, 15 May 2020 17:08:23 -0400 Subject: [PATCH 3/5] Moving tip after the introduction and clarifying limits. --- docs/reference/mapping/types/nested.asciidoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/reference/mapping/types/nested.asciidoc b/docs/reference/mapping/types/nested.asciidoc index 43926c8d39b30..8494163427377 100644 --- a/docs/reference/mapping/types/nested.asciidoc +++ b/docs/reference/mapping/types/nested.asciidoc @@ -8,6 +8,9 @@ The `nested` type is a specialised version of the <> datatype that allows arrays of objects to be indexed in a way that they can be queried independently of each other. +TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <> datatype, which maps an entire object as a single field and allows for simple searches over its contents. +Nested documents and queries are typically expensive, so using the `flattened` datatype for this use case is a better option. + [[nested-arrays-flattening-objects]] ==== How arrays of objects are flattened @@ -165,9 +168,6 @@ Nested documents can be: * sorted with <>. * retrieved and highlighted with <>. -TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <> datatype, which maps an entire object as a single field and allows for simple searches over its contents. -Nested documents and queries are typically expensive, so using the `flattened` datatype for this use case is a better option. - [IMPORTANT] ============================================= @@ -214,7 +214,7 @@ document as standard (flat) fields. Defaults to `false`. [float] === Limits on `nested` mappings and objects -As described earlier, each nested object is indexed as a separate document. +As described earlier, each nested object is indexed as a separate Lucene document. Continuing with the previous example, if we indexed a single document containing 100 `user` objects, then 101 Lucene documents would be created: one for the parent document, and one for each nested object. Because of the expense associated with `nested` mappings, Elasticsearch puts From d666bc7e9ec97916647c25c2106237efaffb3dff Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Tue, 19 May 2020 12:58:00 -0400 Subject: [PATCH 4/5] Update docs/reference/mapping.asciidoc Co-authored-by: James Rodewig --- docs/reference/mapping.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/mapping.asciidoc b/docs/reference/mapping.asciidoc index 9b0e621995ea2..69182b0b9bdfb 100644 --- a/docs/reference/mapping.asciidoc +++ b/docs/reference/mapping.asciidoc @@ -86,7 +86,7 @@ limits the maximum number of <> in a query + [TIP] ==== -If your field mappings contain a large, arbitrary set of keys, consider using the <> datatype. +If your field mappings contain a large, arbitrary set of keys, consider using the <> datatype. ==== `index.mapping.depth.limit`:: From e856be411b8a5885acac48140785744ccd1f25ff Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Tue, 19 May 2020 13:02:15 -0400 Subject: [PATCH 5/5] Update docs/reference/mapping/types/nested.asciidoc Co-authored-by: James Rodewig --- docs/reference/mapping/types/nested.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/mapping/types/nested.asciidoc b/docs/reference/mapping/types/nested.asciidoc index 8494163427377..2b7ff123bfe6e 100644 --- a/docs/reference/mapping/types/nested.asciidoc +++ b/docs/reference/mapping/types/nested.asciidoc @@ -8,7 +8,7 @@ The `nested` type is a specialised version of the <> datatype that allows arrays of objects to be indexed in a way that they can be queried independently of each other. -TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <> datatype, which maps an entire object as a single field and allows for simple searches over its contents. +TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <> datatype, which maps an entire object as a single field and allows for simple searches over its contents. Nested documents and queries are typically expensive, so using the `flattened` datatype for this use case is a better option. [[nested-arrays-flattening-objects]]