Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Add links to flattened datatype (#56794) #56959

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions docs/reference/ingest/processors/kv.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
[[kv-processor]]
=== KV Processor
This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.

For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those automatically by configuring:
This processor helps automatically parse messages (or specific event fields) which are of the `foo=bar` variety.

For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those fields automatically by configuring:

[source,js]
--------------------------------------------------
Expand All @@ -17,8 +16,10 @@ For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`
--------------------------------------------------
// NOTCONSOLE

TIP: Using the KV Processor can result in field names that you cannot control. Consider using the <<flattened>> datatype instead, which maps an entire object as a single field and allows for simple searches over its contents.

[[kv-options]]
.Kv Options
.KV Options
[options="header"]
|======
| Name | Required | Default | Description
Expand Down
44 changes: 27 additions & 17 deletions docs/reference/mapping.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ A mapping definition has:

<<mapping-fields,Meta-fields>>::

Meta-fields are used to customize how a document's metadata associated is
Meta-fields are used to customize how a document's associated metadata is
treated. Examples of meta-fields include the document's
<<mapping-index-field,`_index`>>, <<mapping-id-field,`_id`>>, and
<<mapping-source-field,`_source`>> fields.
Expand Down Expand Up @@ -58,17 +58,16 @@ via the <<multi-fields>> parameter.
[float]
=== Settings to prevent mappings explosion

Defining too many fields in an index is a condition that can lead to a
Defining too many fields in an index can lead to a
mapping explosion, which can cause out of memory errors and difficult
situations to recover from. This problem may be more common than expected.
As an example, consider a situation in which every new document inserted
introduces new fields. This is quite common with dynamic mappings.
Every time a document contains new fields, those will end up in the index's
mappings. This isn't worrying for a small amount of data, but it can become a
situations to recover from.

Consider a situation where every new document inserted
introduces new fields, such as with <<dynamic-mapping,dynamic mapping>>.
Each new field is added to the index mapping, which can become a
problem as the mapping grows.
The following settings allow you to limit the number of field mappings that
can be created manually or dynamically, in order to prevent bad documents from
causing a mapping explosion:

Use the following settings to limit the number of field mappings (created manually or dynamically) and prevent documents from causing a mapping explosion:

`index.mapping.total_fields.limit`::
The maximum number of fields in an index. Field and object mappings, as well as
Expand All @@ -84,26 +83,37 @@ If you increase this setting, we recommend you also increase the
<<search-settings,`indices.query.bool.max_clause_count`>> setting, which
limits the maximum number of <<query-dsl-bool-query,boolean clauses>> in a query.
====
+
[TIP]
====
If your field mappings contain a large, arbitrary set of keys, consider using the <<flattened,flattened>> datatype.
====

`index.mapping.depth.limit`::
The maximum depth for a field, which is measured as the number of inner
objects. For instance, if all fields are defined at the root object level,
then the depth is `1`. If there is one object mapping, then the depth is
`2`, etc. The default is `20`.
`2`, etc. Default is `20`.

// tag::nested-fields-limit[]
`index.mapping.nested_fields.limit`::
The maximum number of distinct `nested` mappings in an index, defaults to `50`.
The maximum number of distinct `nested` mappings in an index. The `nested` type should only be used in special cases, when arrays of objects need to be queried independently of each other. To safeguard against poorly designed mappings, this setting
limits the number of unique `nested` types per index. Default is `50`.
// end::nested-fields-limit[]

// tag::nested-objects-limit[]
`index.mapping.nested_objects.limit`::
The maximum number of `nested` JSON objects within a single document across
all nested types, defaults to 10000.
The maximum number of nested JSON objects that a single document can contain across all
`nested` types. This limit helps to prevent out of memory errors when a document contains too many nested
objects. Default is `10000`.
// end::nested-objects-limit[]

`index.mapping.field_name_length.limit`::
Setting for the maximum length of a field name. The default value is
Long.MAX_VALUE (no limit). This setting isn't really something that addresses
Setting for the maximum length of a field name. This setting isn't really something that addresses
mappings explosion but might still be useful if you want to limit the field length.
It usually shouldn't be necessary to set this setting. The default is okay
unless a user starts to add a huge number of fields with really long names.
unless a user starts to add a huge number of fields with really long names. Default is
`Long.MAX_VALUE` (no limit).

[float]
== Dynamic mapping
Expand Down
52 changes: 25 additions & 27 deletions docs/reference/mapping/types/nested.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,17 @@
++++

The `nested` type is a specialised version of the <<object,`object`>> datatype
that allows arrays of objects to be indexed in a way that they can be queried
that allows arrays of objects to be indexed in a way that they can be queried
independently of each other.

TIP: When ingesting key-value pairs with a large, arbitrary set of keys, you might consider modeling each key-value pair as its own nested document with `key` and `value` fields. Instead, consider using the <<flattened,flattened>> datatype, which maps an entire object as a single field and allows for simple searches over its contents.
Nested documents and queries are typically expensive, so using the `flattened` datatype for this use case is a better option.

[[nested-arrays-flattening-objects]]
==== How arrays of objects are flattened

Arrays of inner <<object,`object` fields>> do not work the way you may expect.
Lucene has no concept of inner objects, so Elasticsearch flattens object
hierarchies into a simple list of field names and values. For instance, the
Elasticsearch has no concept of inner objects. Therefore, it flattens object
hierarchies into a simple list of field names and values. For instance, consider the
following document:

[source,console]
Expand All @@ -35,7 +38,7 @@ PUT my_index/_doc/1

<1> The `user` field is dynamically added as a field of type `object`.

would be transformed internally into a document that looks more like this:
The previous document would be transformed internally into a document that looks more like this:

[source,js]
--------------------------------------------------
Expand Down Expand Up @@ -71,10 +74,12 @@ GET my_index/_search
==== Using `nested` fields for arrays of objects

If you need to index arrays of objects and to maintain the independence of
each object in the array, you should use the `nested` datatype instead of the
<<object,`object`>> datatype. Internally, nested objects index each object in
each object in the array, use the `nested` datatype instead of the
<<object,`object`>> datatype.

Internally, nested objects index each object in
the array as a separate hidden document, meaning that each nested object can be
queried independently of the others, with the <<query-dsl-nested-query,`nested` query>>:
queried independently of the others with the <<query-dsl-nested-query,`nested` query>>:

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -152,6 +157,8 @@ GET my_index/_search
<4> `inner_hits` allow us to highlight the matching nested documents.


[[nested-accessing-documents]]
==== Interacting with `nested` documents
Nested documents can be:

* queried with the <<query-dsl-nested-query,`nested`>> query.
Expand Down Expand Up @@ -209,29 +216,20 @@ document as standard (flat) fields. Defaults to `false`.
[float]
=== Limits on `nested` mappings and objects

As described earlier, each nested object is indexed as a separate document under the hood.
Continuing with the example above, if we indexed a single document containing 100 `user` objects,
then 101 Lucene documents would be created -- one for the parent document, and one for each
As described earlier, each nested object is indexed as a separate Lucene document.
Continuing with the previous example, if we indexed a single document containing 100 `user` objects,
then 101 Lucene documents would be created: one for the parent document, and one for each
nested object. Because of the expense associated with `nested` mappings, Elasticsearch puts
settings in place to guard against performance problems:

`index.mapping.nested_fields.limit`::

The `nested` type should only be used in special cases, when arrays of objects need to be
queried independently of each other. To safeguard against poorly designed mappings, this setting
limits the number of unique `nested` types per index. In our example, the `user` mapping would
count as only 1 towards this limit. Defaults to 50.

`index.mapping.nested_objects.limit`::

This setting limits the number of nested objects that a single document may contain across all
`nested` types, in order to prevent out of memory errors when a document contains too many nested
objects. To illustrate how the setting works, say we added another `nested` type called `comments`
to our example mapping above. Then for each document, the combined number of `user` and `comment`
objects it contains must be below the limit. Defaults to 10000.
include::{docdir}/mapping.asciidoc[tag=nested-fields-limit]

Additional background on these settings, including information on their default values, can be found
in <<mapping-limit-settings>>.
In the previous example, the `user` mapping would count as only 1 towards this limit.

include::{docdir}/mapping.asciidoc[tag=nested-objects-limit]

To illustrate how this setting works, consider adding another `nested` type called `comments`
to the previous example mapping. For each document, the combined number of `user` and `comment`
objects it contains must be below the limit.

See <<mapping-limit-settings>> regarding additional settings for preventing mappings explosion.