From 8df97f6d68d351cfcfc59fcca6baea247c28e8e5 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Tue, 19 May 2020 15:09:18 -0400 Subject: [PATCH] [DOCS] Add links to `flattened` datatype (#56794) (#56962) * Changes for #52239. * Incorporating review feedback from Julie T. Also single-sourcing nexted options in the Mapping page and referencing them in the Nested page. * Moving tip after the introduction and clarifying limits. * Update docs/reference/mapping.asciidoc Co-authored-by: James Rodewig * Update docs/reference/mapping/types/nested.asciidoc Co-authored-by: James Rodewig Co-authored-by: James Rodewig Co-authored-by: James Rodewig --- docs/reference/ingest/processors/kv.asciidoc | 9 ++-- docs/reference/mapping.asciidoc | 44 ++++++++++------- docs/reference/mapping/types/nested.asciidoc | 52 ++++++++++---------- 3 files changed, 57 insertions(+), 48 deletions(-) diff --git a/docs/reference/ingest/processors/kv.asciidoc b/docs/reference/ingest/processors/kv.asciidoc index 3f4350a3301ed..e2db150275d13 100644 --- a/docs/reference/ingest/processors/kv.asciidoc +++ b/docs/reference/ingest/processors/kv.asciidoc @@ -1,9 +1,8 @@ [[kv-processor]] === KV Processor -This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety. - -For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those automatically by configuring: +This processor helps automatically parse messages (or specific event fields) which are of the `foo=bar` variety. +For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those fields automatically by configuring: [source,js] -------------------------------------------------- @@ -17,8 +16,10 @@ For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED` -------------------------------------------------- // NOTCONSOLE +TIP: Using the KV Processor can result in field names that you cannot control. Consider using the <> datatype instead, which maps an entire object as a single field and allows for simple searches over its contents. + [[kv-options]] -.Kv Options +.KV Options [options="header"] |====== | Name | Required | Default | Description diff --git a/docs/reference/mapping.asciidoc b/docs/reference/mapping.asciidoc index 96153f5e2a355..69182b0b9bdfb 100644 --- a/docs/reference/mapping.asciidoc +++ b/docs/reference/mapping.asciidoc @@ -17,7 +17,7 @@ A mapping definition has: <>:: -Meta-fields are used to customize how a document's metadata associated is +Meta-fields are used to customize how a document's associated metadata is treated. Examples of meta-fields include the document's <>, <>, and <> fields. @@ -58,17 +58,16 @@ via the <> parameter. [float] === Settings to prevent mappings explosion -Defining too many fields in an index is a condition that can lead to a +Defining too many fields in an index can lead to a mapping explosion, which can cause out of memory errors and difficult -situations to recover from. This problem may be more common than expected. -As an example, consider a situation in which every new document inserted -introduces new fields. This is quite common with dynamic mappings. -Every time a document contains new fields, those will end up in the index's -mappings. This isn't worrying for a small amount of data, but it can become a +situations to recover from. + +Consider a situation where every new document inserted +introduces new fields, such as with <>. +Each new field is added to the index mapping, which can become a problem as the mapping grows. -The following settings allow you to limit the number of field mappings that -can be created manually or dynamically, in order to prevent bad documents from -causing a mapping explosion: + +Use the following settings to limit the number of field mappings (created manually or dynamically) and prevent documents from causing a mapping explosion: `index.mapping.total_fields.limit`:: The maximum number of fields in an index. Field and object mappings, as well as @@ -84,26 +83,37 @@ If you increase this setting, we recommend you also increase the <> setting, which limits the maximum number of <> in a query. ==== ++ +[TIP] +==== +If your field mappings contain a large, arbitrary set of keys, consider using the <> datatype. +==== `index.mapping.depth.limit`:: The maximum depth for a field, which is measured as the number of inner objects. For instance, if all fields are defined at the root object level, then the depth is `1`. If there is one object mapping, then the depth is - `2`, etc. The default is `20`. + `2`, etc. Default is `20`. +// tag::nested-fields-limit[] `index.mapping.nested_fields.limit`:: - The maximum number of distinct `nested` mappings in an index, defaults to `50`. + The maximum number of distinct `nested` mappings in an index. The `nested` type should only be used in special cases, when arrays of objects need to be queried independently of each other. To safeguard against poorly designed mappings, this setting + limits the number of unique `nested` types per index. Default is `50`. +// end::nested-fields-limit[] +// tag::nested-objects-limit[] `index.mapping.nested_objects.limit`:: - The maximum number of `nested` JSON objects within a single document across - all nested types, defaults to 10000. + The maximum number of nested JSON objects that a single document can contain across all + `nested` types. This limit helps to prevent out of memory errors when a document contains too many nested + objects. Default is `10000`. +// end::nested-objects-limit[] `index.mapping.field_name_length.limit`:: - Setting for the maximum length of a field name. The default value is - Long.MAX_VALUE (no limit). This setting isn't really something that addresses + Setting for the maximum length of a field name. This setting isn't really something that addresses mappings explosion but might still be useful if you want to limit the field length. It usually shouldn't be necessary to set this setting. The default is okay - unless a user starts to add a huge number of fields with really long names. + unless a user starts to add a huge number of fields with really long names. Default is + `Long.MAX_VALUE` (no limit). [float] == Dynamic mapping diff --git a/docs/reference/mapping/types/nested.asciidoc b/docs/reference/mapping/types/nested.asciidoc index aa2ae72dd0522..2b7ff123bfe6e 100644 --- a/docs/reference/mapping/types/nested.asciidoc +++ b/docs/reference/mapping/types/nested.asciidoc @@ -5,14 +5,17 @@ ++++ The `nested` type is a specialised version of the <> datatype -that allows arrays of objects to be indexed in a way that they can be queried +that allows arrays of objects to be indexed in a way that they can be queried independently of each other. +TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <> datatype, which maps an entire object as a single field and allows for simple searches over its contents. +Nested documents and queries are typically expensive, so using the `flattened` datatype for this use case is a better option. + +[[nested-arrays-flattening-objects]] ==== How arrays of objects are flattened -Arrays of inner <> do not work the way you may expect. -Lucene has no concept of inner objects, so Elasticsearch flattens object -hierarchies into a simple list of field names and values. For instance, the +Elasticsearch has no concept of inner objects. Therefore, it flattens object +hierarchies into a simple list of field names and values. For instance, consider the following document: [source,console] @@ -35,7 +38,7 @@ PUT my_index/_doc/1 <1> The `user` field is dynamically added as a field of type `object`. -would be transformed internally into a document that looks more like this: +The previous document would be transformed internally into a document that looks more like this: [source,js] -------------------------------------------------- @@ -71,10 +74,12 @@ GET my_index/_search ==== Using `nested` fields for arrays of objects If you need to index arrays of objects and to maintain the independence of -each object in the array, you should use the `nested` datatype instead of the -<> datatype. Internally, nested objects index each object in +each object in the array, use the `nested` datatype instead of the +<> datatype. + +Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be -queried independently of the others, with the <>: +queried independently of the others with the <>: [source,console] -------------------------------------------------- @@ -152,6 +157,8 @@ GET my_index/_search <4> `inner_hits` allow us to highlight the matching nested documents. +[[nested-accessing-documents]] +==== Interacting with `nested` documents Nested documents can be: * queried with the <> query. @@ -207,29 +214,20 @@ document as standard (flat) fields. Defaults to `false`. [float] === Limits on `nested` mappings and objects -As described earlier, each nested object is indexed as a separate document under the hood. -Continuing with the example above, if we indexed a single document containing 100 `user` objects, -then 101 Lucene documents would be created -- one for the parent document, and one for each +As described earlier, each nested object is indexed as a separate Lucene document. +Continuing with the previous example, if we indexed a single document containing 100 `user` objects, +then 101 Lucene documents would be created: one for the parent document, and one for each nested object. Because of the expense associated with `nested` mappings, Elasticsearch puts settings in place to guard against performance problems: -`index.mapping.nested_fields.limit`:: - - The `nested` type should only be used in special cases, when arrays of objects need to be - queried independently of each other. To safeguard against poorly designed mappings, this setting - limits the number of unique `nested` types per index. In our example, the `user` mapping would - count as only 1 towards this limit. Defaults to 50. - -`index.mapping.nested_objects.limit`:: - - This setting limits the number of nested objects that a single document may contain across all - `nested` types, in order to prevent out of memory errors when a document contains too many nested - objects. To illustrate how the setting works, say we added another `nested` type called `comments` - to our example mapping above. Then for each document, the combined number of `user` and `comment` - objects it contains must be below the limit. Defaults to 10000. +include::{docdir}/mapping.asciidoc[tag=nested-fields-limit] -Additional background on these settings, including information on their default values, can be found -in <>. +In the previous example, the `user` mapping would count as only 1 towards this limit. +include::{docdir}/mapping.asciidoc[tag=nested-objects-limit] +To illustrate how this setting works, consider adding another `nested` type called `comments` +to the previous example mapping. For each document, the combined number of `user` and `comment` +objects it contains must be below the limit. +See <> regarding additional settings for preventing mappings explosion.