Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Add links to flattened datatype #56794

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions docs/reference/ingest/processors/kv.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
[[kv-processor]]
=== KV Processor
This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.

For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those automatically by configuring:
This processor helps automatically parse messages (or specific event fields) which are of the `foo=bar` variety.

For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`, you can parse those fields automatically by configuring:

[source,js]
--------------------------------------------------
Expand All @@ -17,8 +16,11 @@ For example, if you have a log message which contains `ip=1.2.3.4 error=REFUSED`
--------------------------------------------------
// NOTCONSOLE

TIP: Using the KV Processor can result in field names that you cannot control. Consider using the <<flattened>> datatype instead, which maps an entire object as a single field.
While a flattened object provides only a single field to search on, the object's contents can still be searched using simple queries and aggregations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the phrase "While a flattened object provides only a single field to search on" could be confusing -- it could suggest that you can only search the root field. What would you think of this tweak: "Using the KV Processor can create a large number of field names that you don't control. Consider using the flattened datatype instead, which maps an entire object as a single field and allows for simple searches over its contents."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I like your edit, which focuses on the positive (what the user can do) of the flattened object, rather than its limitations.


[[kv-options]]
.Kv Options
.KV Options
[options="header"]
|======
| Name | Required | Default | Description
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/mapping.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ A mapping definition has:

<<mapping-fields,Meta-fields>>::

Meta-fields are used to customize how a document's metadata associated is
Meta-fields are used to customize how a document's associated metadata is
treated. Examples of meta-fields include the document's
<<mapping-index-field,`_index`>>, <<mapping-id-field,`_id`>>, and
<<mapping-source-field,`_source`>> fields.
Expand Down
23 changes: 12 additions & 11 deletions docs/reference/mapping/types/nested.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,13 @@
++++

The `nested` type is a specialised version of the <<object,`object`>> datatype
that allows arrays of objects to be indexed in a way that they can be queried
that allows arrays of objects to be indexed in a way that they can be queried
independently of each other.

==== How arrays of objects are flattened

Arrays of inner <<object,`object` fields>> do not work the way you may expect.
Lucene has no concept of inner objects, so Elasticsearch flattens object
hierarchies into a simple list of field names and values. For instance, the
Elasticsearch has no concept of inner objects. Therefore, it flattens object
hierarchies into a simple list of field names and values. For instance, consider the
following document:

[source,console]
Expand All @@ -35,7 +34,7 @@ PUT my_index/_doc/1

<1> The `user` field is dynamically added as a field of type `object`.

would be transformed internally into a document that looks more like this:
The previous document would be transformed internally into a document that looks more like this:

[source,js]
--------------------------------------------------
Expand Down Expand Up @@ -71,10 +70,15 @@ GET my_index/_search
==== Using `nested` fields for arrays of objects

If you need to index arrays of objects and to maintain the independence of
each object in the array, you should use the `nested` datatype instead of the
<<object,`object`>> datatype. Internally, nested objects index each object in
each object in the array, use the `nested` datatype instead of the
<<object,`object`>> datatype.

TIP: If you consider creating `nested` objects with two `key` and `value` keyword fields, consider using the <<flattened>> datatype instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could move this tip to the end of the section? I think it comes right in the middle of an important explanation and breaks the continuity.

Copy link
Contributor Author

@lockewritesdocs lockewritesdocs May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering a new section for Interacted with nested documents that contains the ways users can interact with these documents, this new tip, and the Important note that exists. For example:

image

Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query. While a flattened object provides only a single field to search on, the object's contents can still be searched using simple queries and aggregations.
Copy link
Contributor

@jtibshirani jtibshirani May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion for how to restructure this tip:

  • We could first give some context, saying that when ingesting key-value pairs with a large arbitrary set of keys, one technique is to model each pair as its own nested document with key and value fields.
  • Instead we'd suggest the using flattened datatype, "which maps an entire object as a single field and allows for simple searches over its contents."

One other comment -- instead of describing the downside as "only be accessed within the scope of the nested query", I think it'd be clearer to mention that nested documents and queries are generally expensive, and that the flattened datatype is a better fit for this use case.

Copy link
Contributor Author

@lockewritesdocs lockewritesdocs May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jtibshirani -- I made several changes to this Tip that incorporates your feedback. See my latest commit for specific changes.


Internally, nested objects index each object in
the array as a separate hidden document, meaning that each nested object can be
queried independently of the others, with the <<query-dsl-nested-query,`nested` query>>:
queried independently of the others with the <<query-dsl-nested-query,`nested` query>>:

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -230,6 +234,3 @@ settings in place to guard against performance problems:

Additional background on these settings, including information on their default values, can be found
in <<mapping-limit-settings>>.