From 74309933e347012855f3e8cc770116ff9b3586b7 Mon Sep 17 00:00:00 2001
From: Wylie Conlon <wylieconlon@gmail.com>
Date: Thu, 29 Oct 2020 14:53:46 -0400
Subject: [PATCH 1/6] Clarify field data cache behavior

---
 docs/plugins/mapper-annotated-text.asciidoc   |   2 +-
 .../diversified-sampler-aggregation.asciidoc  |   2 +-
 .../significantterms-aggregation.asciidoc     |   2 +-
 docs/reference/cat/fielddata.asciidoc         |   4 +-
 docs/reference/cluster/stats.asciidoc         |   2 +-
 docs/reference/how-to/search-speed.asciidoc   |  15 +-
 .../mapping/fields/id-field.asciidoc          |  14 +-
 docs/reference/mapping/params.asciidoc        |   2 -
 .../params/eager-global-ordinals.asciidoc     |  11 +-
 .../mapping/params/fielddata.asciidoc         | 134 ------------------
 .../mapping/types/parent-join.asciidoc        |   5 +-
 docs/reference/mapping/types/text.asciidoc    | 109 ++++++++++++++
 .../modules/indices/circuit_breaker.asciidoc  |   6 +-
 .../modules/indices/fielddata.asciidoc        |  46 ++++--
 14 files changed, 178 insertions(+), 176 deletions(-)
 delete mode 100644 docs/reference/mapping/params/fielddata.asciidoc

diff --git a/docs/plugins/mapper-annotated-text.asciidoc b/docs/plugins/mapper-annotated-text.asciidoc
index 4a30da47d62c2..a1dd0bd3dd3c9 100644
--- a/docs/plugins/mapper-annotated-text.asciidoc
+++ b/docs/plugins/mapper-annotated-text.asciidoc
@@ -18,7 +18,7 @@ include::install_remove.asciidoc[]
 [[mapper-annotated-text-usage]]
 ==== Using the `annotated-text` field
 
-The `annotated-text` tokenizes text content as per the more common `text` field (see 
+The `annotated-text` tokenizes text content as per the more common <<text, `text` field>> (see 
 "limitations" below) but also injects any marked-up annotation tokens directly into
 the search index:
 
diff --git a/docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc b/docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc
index f49a12ce0eeba..87ee6b62f4b92 100644
--- a/docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc
+++ b/docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc
@@ -178,7 +178,7 @@ Each option will hold up to `shard_size` values in memory while performing de-du
  - hold ordinals of the field as determined by the Lucene index (`global_ordinals`)
  - hold hashes of the field values - with potential for hash collisions (`bytes_hash`)
 
-The default setting is to use `global_ordinals` if this information is available from the Lucene index and reverting to `map` if not.
+The default setting is to use <<eager-global-ordinals,`global_ordinals`>> if this information is available from the Lucene index and reverting to `map` if not.
 The `bytes_hash` setting may prove faster in some cases but introduces the possibility of false positives in de-duplication logic due to the possibility of hash collisions.
 Please note that Elasticsearch will ignore the choice of execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.
 
diff --git a/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc b/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc
index 92ded6243ccca..a9dcfe1532890 100644
--- a/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc
+++ b/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc
@@ -550,7 +550,7 @@ A description of the different collection modes can be found in the
 There are different mechanisms by which terms aggregations can be executed:
 
  - by using field values directly in order to aggregate data per-bucket (`map`)
- - by using global ordinals of the field and allocating one bucket per global ordinal (`global_ordinals`)
+ - by using <<eager-global-ordinals,global ordinals>> of the field and allocating one bucket per global ordinal (`global_ordinals`)
 
 Elasticsearch tries to have sensible defaults so this is something that generally doesn't need to be configured.
 
diff --git a/docs/reference/cat/fielddata.asciidoc b/docs/reference/cat/fielddata.asciidoc
index 60acf19095c90..20c6207cf399c 100644
--- a/docs/reference/cat/fielddata.asciidoc
+++ b/docs/reference/cat/fielddata.asciidoc
@@ -4,8 +4,8 @@
 <titleabbrev>cat fielddata</titleabbrev>
 ++++
 
-Returns the amount of heap memory currently used by fielddata on every data node
-in the cluster.
+Returns the amount of heap memory currently used by the
+<<modules-fielddata, field data cache>> on every data node in the cluster.
 
 
 [[cat-fielddata-api-request]]
diff --git a/docs/reference/cluster/stats.asciidoc b/docs/reference/cluster/stats.asciidoc
index b5affc837a7af..3f5790e1ba885 100644
--- a/docs/reference/cluster/stats.asciidoc
+++ b/docs/reference/cluster/stats.asciidoc
@@ -246,7 +246,7 @@ activities.
 
 `fielddata`::
 (object)
-Contains statistics about the field data cache of selected nodes.
+Contains statistics about the <<modules-fielddata, field data cache>> of selected nodes.
 +
 .Properties of `fielddata`
 [%collapsible%open]
diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc
index 79df665127edc..565a427531df5 100644
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@@ -303,13 +303,14 @@ may become much worse.
 [discrete]
 === Warm up global ordinals
 
-Global ordinals are a data-structure that is used in order to run
-<<search-aggregations-bucket-terms-aggregation,`terms`>> aggregations on
-<<keyword,`keyword`>> fields. They are loaded lazily in memory because
-Elasticsearch does not know which fields will be used in `terms` aggregations
-and which fields won't. You can tell Elasticsearch to load global ordinals
-eagerly when starting or refreshing a shard by configuring mappings as
-described below:
+<<eager-global-ordinals,Global ordinals>> are a data structure that is used in
+order to increase aggregation speed. They are calculated lazily and stored in
+the JVM heap as part of the <<modules-fielddata, field data cache>>. For fields
+that are heavily used for bucketing aggregations, you can tell {es} to add to
+the cache before requests are received. This should be done carefully because it
+will increase heap usage and delay indexing until the cache is created. This can
+be set dynamically on an existing mapping by setting the
+<<eager-global-ordinals, eager global ordinals>> mappping parameter:
 
 [source,console]
 --------------------------------------------------
diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc
index 33f1e8eb7178c..8bf53798cd76a 100644
--- a/docs/reference/mapping/fields/id-field.asciidoc
+++ b/docs/reference/mapping/fields/id-field.asciidoc
@@ -33,12 +33,14 @@ GET my-index-000001/_search
 
 <1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
 
-The value of the `_id` field is also accessible in aggregations or for sorting,
-but doing so is discouraged as it requires to load a lot of data in memory. In
-case sorting or aggregating on the `_id` field is required, it is advised to
-duplicate the content of the `_id` field in another field that has `doc_values`
-enabled.
-
+The `_id` field is by default not available by default for use with aggregations or sorting.
+To aggregate or sort by the `_id` field, it is recommended to 
+duplicate the `_id` field onto a `keyword` field using the <<copy-to, `copy_to` mapping parameter>>.
+
+It is not recommended to enable `_id` fields to be aggregated using the <<modules-fielddata, in-memory field data cache>>,
+but it is possible. This can be done by <<cluster-update-settings, changing the cluster setting>>
+to `"indices.id_field_data.enabled": true`. Enabling this setting and then aggregating on the `_id`
+field will use significant memory and show deprecation warnings in the logs.
 
 [NOTE]
 ==================================================
diff --git a/docs/reference/mapping/params.asciidoc b/docs/reference/mapping/params.asciidoc
index a3ddbec095342..f0f9e9a41a7a6 100644
--- a/docs/reference/mapping/params.asciidoc
+++ b/docs/reference/mapping/params.asciidoc
@@ -49,8 +49,6 @@ include::params/eager-global-ordinals.asciidoc[]
 
 include::params/enabled.asciidoc[]
 
-include::params/fielddata.asciidoc[]
-
 include::params/format.asciidoc[]
 
 include::params/ignore-above.asciidoc[]
diff --git a/docs/reference/mapping/params/eager-global-ordinals.asciidoc b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
index 4b1ae5f626f71..9f771a3d66745 100644
--- a/docs/reference/mapping/params/eager-global-ordinals.asciidoc
+++ b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
@@ -34,11 +34,12 @@ to be enabled.
 * Operations on parent and child documents from a `join` field, including
 `has_child` queries and `parent` aggregations.
 
-NOTE: The global ordinal mapping is an on-heap data structure. When measuring
-memory usage, Elasticsearch counts the memory from global ordinals as
-'fielddata'. Global ordinals memory is included in the
-<<fielddata-circuit-breaker, fielddata circuit breaker>>, and is returned
-under `fielddata` in the <<cluster-nodes-stats, node stats>> response.
+NOTE: The global ordinal mapping use heap memory as part of the
+<<modules-fielddata, field data cache>>. Aggregations that include high
+cardinality values can use a significant amount of heap memory, and
+could exceed the threshold of the
+<<fielddata-circuit-breaker, field data circuit breaker>>.
+It is recommended to set a specific limit for the field data cache size.
 
 ==== Loading global ordinals
 
diff --git a/docs/reference/mapping/params/fielddata.asciidoc b/docs/reference/mapping/params/fielddata.asciidoc
deleted file mode 100644
index 1faa82a53f310..0000000000000
--- a/docs/reference/mapping/params/fielddata.asciidoc
+++ /dev/null
@@ -1,134 +0,0 @@
-[[fielddata]]
-=== `fielddata`
-
-Most fields are <<mapping-index,indexed>> by default, which makes them
-searchable. Sorting, aggregations, and accessing field values in scripts,
-however, requires a different access pattern from search.
-
-Search needs to answer the question _"Which documents contain this term?"_,
-while sorting and aggregations need to answer a different question: _"What is
-the value of this field for **this** document?"_.
-
-Most fields can use index-time, on-disk <<doc-values,`doc_values`>> for this
-data access pattern, but <<text,`text`>> fields do not support `doc_values`.
-
-Instead, `text` fields use a query-time *in-memory* data structure called
-`fielddata`.  This data structure is built on demand the first time that a
-field is used for aggregations, sorting, or in a script.  It is built by
-reading the entire inverted index for each segment from disk, inverting the
-term ↔︎ document relationship, and storing the result in memory, in the JVM
-heap.
-
-[[fielddata-disabled-text-fields]]
-==== Fielddata is disabled on `text` fields by default
-
-Fielddata can consume a *lot* of heap space, especially when loading high
-cardinality `text` fields.  Once fielddata has been loaded into the heap, it
-remains there for the lifetime of the segment. Also, loading fielddata is an
-expensive process which can cause users to experience latency hits.  This is
-why fielddata is disabled by default.
-
-If you try to sort, aggregate, or access values from a script on a `text`
-field, you will see this exception:
-
-[literal]
-Fielddata is disabled on text fields by default.  Set `fielddata=true` on
-[`your_field_name`] in order to load  fielddata in memory by uninverting the
-inverted index. Note that this can however use significant memory.
-
-[[before-enabling-fielddata]]
-==== Before enabling fielddata
-
-Before you enable fielddata, consider why you are using a `text` field for
-aggregations, sorting, or in a script.  It usually doesn't make sense to do
-so.
-
-A text field is analyzed before indexing so that a value like
-`New York` can be found by searching for `new` or for `york`.  A `terms`
-aggregation on this field will return a `new` bucket and a `york` bucket, when
-you probably want a single bucket called `New York`.
-
-Instead, you should have a `text` field for full text searches, and an
-unanalyzed <<keyword,`keyword`>> field with <<doc-values,`doc_values`>>
-enabled for aggregations, as follows:
-
-[source,console]
----------------------------------
-PUT my-index-000001
-{
-  "mappings": {
-    "properties": {
-      "my_field": { <1>
-        "type": "text",
-        "fields": {
-          "keyword": { <2>
-            "type": "keyword"
-          }
-        }
-      }
-    }
-  }
-}
----------------------------------
-
-<1> Use the `my_field` field for searches.
-<2> Use the `my_field.keyword` field for aggregations, sorting, or in scripts.
-
-[[enable-fielddata-text-fields]]
-==== Enabling fielddata on `text` fields
-
-You can enable fielddata on an existing `text` field using the
-<<indices-put-mapping,PUT mapping API>> as follows:
-
-[source,console]
------------------------------------
-PUT my-index-000001/_mapping
-{
-  "properties": {
-    "my_field": { <1>
-      "type":     "text",
-      "fielddata": true
-    }
-  }
-}
------------------------------------
-// TEST[continued]
-
-<1> The mapping that you specify for `my_field` should consist of the existing
-    mapping for that field, plus the `fielddata` parameter.
-
-[[field-data-filtering]]
-==== `fielddata_frequency_filter`
-
-Fielddata filtering can be used to reduce the number of terms loaded into
-memory, and thus reduce memory usage. Terms can be filtered by _frequency_:
-
-The frequency filter allows you to only load terms whose document frequency falls
-between a `min` and `max` value, which can be expressed an absolute
-number (when the number is bigger than 1.0) or as a percentage
-(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
-*per segment*. Percentages are based on the number of docs which have a
-value for the field, as opposed to all docs in the segment.
-
-Small segments can be excluded completely by specifying the minimum
-number of docs that the segment should contain with `min_segment_size`:
-
-[source,console]
---------------------------------------------------
-PUT my-index-000001
-{
-  "mappings": {
-    "properties": {
-      "tag": {
-        "type": "text",
-        "fielddata": true,
-        "fielddata_frequency_filter": {
-          "min": 0.001,
-          "max": 0.1,
-          "min_segment_size": 500
-        }
-      }
-    }
-  }
-}
---------------------------------------------------
diff --git a/docs/reference/mapping/types/parent-join.asciidoc b/docs/reference/mapping/types/parent-join.asciidoc
index a33ab33baadf3..6826f155cc4f7 100644
--- a/docs/reference/mapping/types/parent-join.asciidoc
+++ b/docs/reference/mapping/types/parent-join.asciidoc
@@ -120,11 +120,12 @@ PUT my-index-000001/_doc/4?routing=1&refresh
 <2> `answer` is the name of the join for this document
 <3> The parent id of this child document
 
-==== Parent-join and performance.
+==== Parent-join and performance
 
 The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance
 is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a
-significant tax to your query performance.
+significant tax to your query performance. It also increases the usage of the JVM heap on the
+<<modules-fielddata, field data cache>>.
 
 The only case where the join field makes sense is if your data contains a one-to-many relationship where
 one entity significantly outnumbers the other entity. An example of such case is a use case with products
diff --git a/docs/reference/mapping/types/text.asciidoc b/docs/reference/mapping/types/text.asciidoc
index 9ef0399fd16c3..1d816867a637f 100644
--- a/docs/reference/mapping/types/text.asciidoc
+++ b/docs/reference/mapping/types/text.asciidoc
@@ -141,3 +141,112 @@ The following parameters are accepted by `text` fields:
 <<mapping-field-meta,`meta`>>::
 
     Metadata about the field.
+
+[[fielddata]]
+==== `fielddata`
+
+`text` fields are searchable by default, but by default are not available for
+aggregations, sorting, or scripting. If you try to sort, aggregate, or access
+values from a script on a `text` field, you will see this exception:
+
+[literal]
+Fielddata is disabled on text fields by default.  Set `fielddata=true` on
+[`your_field_name`] in order to load fielddata in memory by uninverting the
+inverted index. Note that this can however use significant memory.
+
+Field data is the only way to access the analyzed tokens from a full text field
+in aggregations, sorting, or scripting. For example, a full text field like `New York`
+would get analyzed as `new` and `york`. To aggregate on these tokens requires field data.
+
+[[before-enabling-fielddata]]
+==== Before enabling fielddata
+
+It usually doesn't make sense to enable fielddata on text fields. Field data
+is stored in the heap with the <<modules-fielddata, field data cache>> because it
+is expensive to calculate. Calculating the field data can cause latency spikes, and
+increasing heap usage is a cause of cluster performance issues.
+
+Most users who want to do more with text fields use <<multi-fields, multi-field mappings>>
+by having both a `text` field for full text searches, and an
+unanalyzed <<keyword,`keyword`>> field for aggregations, as follows:
+
+[source,console]
+---------------------------------
+PUT my-index-000001
+{
+  "mappings": {
+    "properties": {
+      "my_field": { <1>
+        "type": "text",
+        "fields": {
+          "keyword": { <2>
+            "type": "keyword"
+          }
+        }
+      }
+    }
+  }
+}
+---------------------------------
+
+<1> Use the `my_field` field for searches.
+<2> Use the `my_field.keyword` field for aggregations, sorting, or in scripts.
+
+[[enable-fielddata-text-fields]]
+==== Enabling fielddata on `text` fields
+
+You can enable fielddata on an existing `text` field using the
+<<indices-put-mapping,PUT mapping API>> as follows:
+
+[source,console]
+-----------------------------------
+PUT my-index-000001/_mapping
+{
+  "properties": {
+    "my_field": { <1>
+      "type":     "text",
+      "fielddata": true
+    }
+  }
+}
+-----------------------------------
+// TEST[continued]
+
+<1> The mapping that you specify for `my_field` should consist of the existing
+    mapping for that field, plus the `fielddata` parameter.
+
+[[field-data-filtering]]
+==== `fielddata_frequency_filter`
+
+Fielddata filtering can be used to reduce the number of terms loaded into
+memory, and thus reduce memory usage. Terms can be filtered by _frequency_:
+
+The frequency filter allows you to only load terms whose document frequency falls
+between a `min` and `max` value, which can be expressed an absolute
+number (when the number is bigger than 1.0) or as a percentage
+(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
+*per segment*. Percentages are based on the number of docs which have a
+value for the field, as opposed to all docs in the segment.
+
+Small segments can be excluded completely by specifying the minimum
+number of docs that the segment should contain with `min_segment_size`:
+
+[source,console]
+--------------------------------------------------
+PUT my-index-000001
+{
+  "mappings": {
+    "properties": {
+      "tag": {
+        "type": "text",
+        "fielddata": true,
+        "fielddata_frequency_filter": {
+          "min": 0.001,
+          "max": 0.1,
+          "min_segment_size": 500
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
diff --git a/docs/reference/modules/indices/circuit_breaker.asciidoc b/docs/reference/modules/indices/circuit_breaker.asciidoc
index d06b3f27c11c5..2f85996c0d433 100644
--- a/docs/reference/modules/indices/circuit_breaker.asciidoc
+++ b/docs/reference/modules/indices/circuit_breaker.asciidoc
@@ -33,9 +33,9 @@ The parent-level breaker can be configured with the following settings:
 [discrete]
 ==== Field data circuit breaker
 The field data circuit breaker allows Elasticsearch to estimate the amount of
-memory a field will require to be loaded into memory. It can then prevent the
-field data loading by raising an exception. By default the limit is configured
-to 40% of the maximum JVM heap. It can be configured with the following
+memory a field will require to be loaded into the <<modules-fielddata, field data cache>>.
+It can then prevent the field data loading by raising an exception. By default the
+limit is configured to 40% of the maximum JVM heap. It can be configured with the following
 parameters:
 
 [[fielddata-circuit-breaker-limit]]
diff --git a/docs/reference/modules/indices/fielddata.asciidoc b/docs/reference/modules/indices/fielddata.asciidoc
index 5a2bbac9f379d..d3fae03e3aab5 100644
--- a/docs/reference/modules/indices/fielddata.asciidoc
+++ b/docs/reference/modules/indices/fielddata.asciidoc
@@ -1,16 +1,41 @@
 [[modules-fielddata]]
 === Field data cache settings
 
-The field data cache is used mainly when sorting on or computing aggregations
-on a field. It loads all the field values to memory in order to provide fast
-document based access to those values. The field data cache can be
-expensive to build for a field, so its recommended to have enough memory
-to allocate it, and to keep it loaded.
+The field data cache is an in-memory data structure, built on demand
+based on the type of query that is being run. It contains both
+<<fielddata, `fielddata`>> and <<eager-global-ordinals, global ordinals>>,
+which serve similar functions for different types of queries.
+The cache uses the JVM heap, so it is important to monitor its use
+and not to overload your cluster.
 
-The amount of memory used for the field
-data cache can be controlled using `indices.fielddata.cache.size`. Note:
-reloading  the field data which does not fit into your cache will be expensive
-and  perform poorly.
+Other than fields where the cache is built ahead of time, it is populated as needed
+on request. This includes:
+
+* Certain bucket aggregations on `keyword`, `ip`, and `flattened` fields. This
+includes `terms` aggregations, as well as `composite`, `diversified_sampler`,
+and `significant_terms`.
+* Bucket aggregations on `text` fields that have <<fielddata, `fielddata`>>
+ enabled.
+* Bucket aggregations on the <<mapping-id-field, `_id` field>> when it is enabled for aggregation
+* Operations on parent and child documents from a `join` field, including
+`has_child` queries and `parent` aggregations.
+
+[discrete]
+[[fielddata-sizing]]
+==== Cache size
+
+The entries in the cache are expensive to build, so the default behavior is
+to keep the cache loaded in memory
+
+The default cache size is unlimited, causing the cache to grow until it
+reaches the limit set by the <<fielddata-circuit-breaker, field data circuit breaker>>.
+It is recommended to set a cache size limit that is smaller than the circuit breaker
+value. Setting the limit will cause the cache to behave as a least-recently-updated
+cache, only keeping the most recently requested field data.
+
+If the field data circuit breaker is reached, preventing further requests, the
+best option is to manually <<indices-clearcache, clear the cache>>. This will
+allow requests to re-build the cache setting.
 
 `indices.fielddata.cache.size`::
 (<<static-cluster-setting,Static>>)
@@ -24,5 +49,4 @@ absolute value, eg `12GB`. Defaults to unbounded.  Also see
 
 You can monitor memory usage for field data as well as the field data circuit
 breaker using
-<<cluster-nodes-stats,Nodes Stats API>>
-
+<<cluster-nodes-stats,Nodes Stats API>> or the <<cat-fielddata, _cat/fielddata API>>

From 57411deb06d4d3880b63607207062b987024a58f Mon Sep 17 00:00:00 2001
From: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Date: Thu, 29 Oct 2020 15:19:41 -0400
Subject: [PATCH 2/6] Update docs/plugins/mapper-annotated-text.asciidoc

---
 docs/plugins/mapper-annotated-text.asciidoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/plugins/mapper-annotated-text.asciidoc b/docs/plugins/mapper-annotated-text.asciidoc
index a1dd0bd3dd3c9..9307b6aaefe13 100644
--- a/docs/plugins/mapper-annotated-text.asciidoc
+++ b/docs/plugins/mapper-annotated-text.asciidoc
@@ -18,7 +18,7 @@ include::install_remove.asciidoc[]
 [[mapper-annotated-text-usage]]
 ==== Using the `annotated-text` field
 
-The `annotated-text` tokenizes text content as per the more common <<text, `text` field>> (see 
+The `annotated-text` tokenizes text content as per the more common {ref}/text.html[`text`] field (see 
 "limitations" below) but also injects any marked-up annotation tokens directly into
 the search index:
 

From 391ab35414f190530d240a554831b17b48ac0803 Mon Sep 17 00:00:00 2001
From: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Date: Mon, 2 Nov 2020 11:15:07 -0500
Subject: [PATCH 3/6] [DOCS] Add redirect for `fielddata`

---
 docs/reference/mapping/types/text.asciidoc |  4 +--
 docs/reference/redirects.asciidoc          | 42 ++++++++++++++++++++++
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/docs/reference/mapping/types/text.asciidoc b/docs/reference/mapping/types/text.asciidoc
index 1d816867a637f..a31f7b172d645 100644
--- a/docs/reference/mapping/types/text.asciidoc
+++ b/docs/reference/mapping/types/text.asciidoc
@@ -143,7 +143,7 @@ The following parameters are accepted by `text` fields:
     Metadata about the field.
 
 [[fielddata]]
-==== `fielddata`
+==== `fielddata` mapping parameter
 
 `text` fields are searchable by default, but by default are not available for
 aggregations, sorting, or scripting. If you try to sort, aggregate, or access
@@ -216,7 +216,7 @@ PUT my-index-000001/_mapping
     mapping for that field, plus the `fielddata` parameter.
 
 [[field-data-filtering]]
-==== `fielddata_frequency_filter`
+==== `fielddata_frequency_filter` mapping parameter
 
 Fielddata filtering can be used to reduce the number of terms loaded into
 memory, and thus reduce memory usage. Terms can be filtered by _frequency_:
diff --git a/docs/reference/redirects.asciidoc b/docs/reference/redirects.asciidoc
index 2c260b0ecd730..756ea70bd1368 100644
--- a/docs/reference/redirects.asciidoc
+++ b/docs/reference/redirects.asciidoc
@@ -1231,3 +1231,45 @@ See <<elasticsearch-croneval-parameters>>.
 
 The autoscaling decision API has been renamed to capacity,
 see <<autoscaling-get-autoscaling-capacity>>.
+
+[role="exclude",id="caching-heavy-aggregations"]
+=== Caching heavy aggregations
+
+See <<agg-caches>>.
+
+[role="exclude",id="returning-only-agg-results"]
+=== Returning only aggregation results
+
+See <<return-only-agg-results>>.
+
+[role="exclude",id="agg-metadata"]
+=== Aggregation metadata
+
+See <<add-metadata-to-an-agg>>.
+
+[role="exclude",id="returning-aggregation-type"]
+=== Returning the type of the aggregation
+
+See <<return-agg-type>>.
+
+[role="exclude",id="indexing-aggregation-results"]
+=== Indexing aggregation results with transforms
+
+See <<transforms>>.
+
+[role="exclude",id="search-aggregations-matrix"]
+=== Matrix aggregations
+
+See <<search-aggregations-matrix-stats-aggregation>>.
+
+[[search-aggregations-pipeline-movavg-aggregation]]
+=== Moving average aggregation
+
+The moving average aggregation has been removed. Use the
+<<search-aggregations-pipeline-movfn-aggregation,moving function aggregation>>
+instead.
+
+[[fielddata]]
+=== `fielddata` mapping parameter
+
+See <<fielddata-mapping-param>>.

From c121d75cee297a5cabf41e657e1c740c67287b92 Mon Sep 17 00:00:00 2001
From: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Date: Mon, 2 Nov 2020 11:24:22 -0500
Subject: [PATCH 4/6] [DOCS] Add anchor for redirect

---
 docs/reference/mapping/types/text.asciidoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/reference/mapping/types/text.asciidoc b/docs/reference/mapping/types/text.asciidoc
index a31f7b172d645..72970e582913d 100644
--- a/docs/reference/mapping/types/text.asciidoc
+++ b/docs/reference/mapping/types/text.asciidoc
@@ -142,7 +142,7 @@ The following parameters are accepted by `text` fields:
 
     Metadata about the field.
 
-[[fielddata]]
+[[fielddata-mapping-param]]
 ==== `fielddata` mapping parameter
 
 `text` fields are searchable by default, but by default are not available for

From 784de588a4c5cf17b3fa060f78b68fc6c3cd993a Mon Sep 17 00:00:00 2001
From: Wylie Conlon <wylieconlon@gmail.com>
Date: Tue, 3 Nov 2020 14:08:50 -0500
Subject: [PATCH 5/6] Review comments

---
 docs/reference/how-to/search-speed.asciidoc   | 14 +++---
 .../mapping/fields/id-field.asciidoc          | 25 +++++------
 .../params/eager-global-ordinals.asciidoc     |  8 ++--
 .../mapping/types/parent-join.asciidoc        |  3 +-
 .../modules/indices/circuit_breaker.asciidoc  |  9 ++--
 .../modules/indices/fielddata.asciidoc        | 45 ++++++-------------
 6 files changed, 39 insertions(+), 65 deletions(-)

diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc
index 565a427531df5..2503c02cc1e3b 100644
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@@ -303,14 +303,14 @@ may become much worse.
 [discrete]
 === Warm up global ordinals
 
-<<eager-global-ordinals,Global ordinals>> are a data structure that is used in
-order to increase aggregation speed. They are calculated lazily and stored in
+<<eager-global-ordinals,Global ordinals>> are a data structure that is used to
+optimize the performance of aggregations. They are calculated lazily and stored in
 the JVM heap as part of the <<modules-fielddata, field data cache>>. For fields
-that are heavily used for bucketing aggregations, you can tell {es} to add to
-the cache before requests are received. This should be done carefully because it
-will increase heap usage and delay indexing until the cache is created. This can
-be set dynamically on an existing mapping by setting the
-<<eager-global-ordinals, eager global ordinals>> mappping parameter:
+that are heavily used for bucketing aggregations, you can tell {es} to construct
+and cache the global ordinals before requests are received. This should be done
+carefully because it will increase heap usage and delay indexing until the global
+ordinals are constructed. This can be set dynamically on an existing mapping by
+setting the <<eager-global-ordinals, eager global ordinals>> mapping parameter:
 
 [source,console]
 --------------------------------------------------
diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc
index 8bf53798cd76a..16ee7d8619408 100644
--- a/docs/reference/mapping/fields/id-field.asciidoc
+++ b/docs/reference/mapping/fields/id-field.asciidoc
@@ -3,10 +3,14 @@
 
 Each document has an `_id` that uniquely identifies it, which is indexed
 so that documents can be looked up either with the <<docs-get,GET API>> or the
-<<query-dsl-ids-query,`ids` query>>.
+<<query-dsl-ids-query,`ids` query>>. The `_id` can either be assigned at
+indexing time, or a unique `_id` can be generated by {es}. This field is not
+configurable.
 
-The value of the `_id` field is accessible in certain queries (`term`,
-`terms`, `match`, `query_string`, `simple_query_string`).
+The value of the `_id` field is only accessible in certain queries (`term`,
+`terms`, `match`, `query_string`, `simple_query_string`), but is restricted
+from use in aggregations, sorting, or scripting. For those use cases the
+`keyword` type is recommended.
 
 [source,console]
 --------------------------
@@ -16,16 +20,16 @@ PUT my-index-000001/_doc/1
   "text": "Document with ID 1"
 }
 
-PUT my-index-000001/_doc/2?refresh=true
+POST my-index-000001/_doc/?refresh=true
 {
-  "text": "Document with ID 2"
+  "text": "Document with generated ID"
 }
 
 GET my-index-000001/_search
 {
   "query": {
     "terms": {
-      "_id": [ "1", "2" ] <1>
+      "_id": [ "1", "AhcEj3UB1Y-S1MdSrUDG" ] <1>
     }
   }
 }
@@ -33,15 +37,6 @@ GET my-index-000001/_search
 
 <1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
 
-The `_id` field is by default not available by default for use with aggregations or sorting.
-To aggregate or sort by the `_id` field, it is recommended to 
-duplicate the `_id` field onto a `keyword` field using the <<copy-to, `copy_to` mapping parameter>>.
-
-It is not recommended to enable `_id` fields to be aggregated using the <<modules-fielddata, in-memory field data cache>>,
-but it is possible. This can be done by <<cluster-update-settings, changing the cluster setting>>
-to `"indices.id_field_data.enabled": true`. Enabling this setting and then aggregating on the `_id`
-field will use significant memory and show deprecation warnings in the logs.
-
 [NOTE]
 ==================================================
 `_id` is limited to 512 bytes in size and larger values will be rejected.
diff --git a/docs/reference/mapping/params/eager-global-ordinals.asciidoc b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
index 9f771a3d66745..c990abd5da9f4 100644
--- a/docs/reference/mapping/params/eager-global-ordinals.asciidoc
+++ b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
@@ -35,11 +35,9 @@ to be enabled.
 `has_child` queries and `parent` aggregations.
 
 NOTE: The global ordinal mapping use heap memory as part of the
-<<modules-fielddata, field data cache>>. Aggregations that include high
-cardinality values can use a significant amount of heap memory, and
-could exceed the threshold of the
-<<fielddata-circuit-breaker, field data circuit breaker>>.
-It is recommended to set a specific limit for the field data cache size.
+<<modules-fielddata, field data cache>>. Aggregations on high cardinality fields
+can use a significant amount of heap memory, and could exceed the threshold
+of the <<fielddata-circuit-breaker, field data circuit breaker>>.
 
 ==== Loading global ordinals
 
diff --git a/docs/reference/mapping/types/parent-join.asciidoc b/docs/reference/mapping/types/parent-join.asciidoc
index 6826f155cc4f7..8a4d1e66b390d 100644
--- a/docs/reference/mapping/types/parent-join.asciidoc
+++ b/docs/reference/mapping/types/parent-join.asciidoc
@@ -124,8 +124,7 @@ PUT my-index-000001/_doc/4?routing=1&refresh
 
 The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance
 is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a
-significant tax to your query performance. It also increases the usage of the JVM heap on the
-<<modules-fielddata, field data cache>>.
+significant tax to your query performance. It can also trigger <<eager-global-ordinals, global ordinals>> to be built.
 
 The only case where the join field makes sense is if your data contains a one-to-many relationship where
 one entity significantly outnumbers the other entity. An example of such case is a use case with products
diff --git a/docs/reference/modules/indices/circuit_breaker.asciidoc b/docs/reference/modules/indices/circuit_breaker.asciidoc
index 2f85996c0d433..2fd929f85cedb 100644
--- a/docs/reference/modules/indices/circuit_breaker.asciidoc
+++ b/docs/reference/modules/indices/circuit_breaker.asciidoc
@@ -32,11 +32,10 @@ The parent-level breaker can be configured with the following settings:
 [[fielddata-circuit-breaker]]
 [discrete]
 ==== Field data circuit breaker
-The field data circuit breaker allows Elasticsearch to estimate the amount of
-memory a field will require to be loaded into the <<modules-fielddata, field data cache>>.
-It can then prevent the field data loading by raising an exception. By default the
-limit is configured to 40% of the maximum JVM heap. It can be configured with the following
-parameters:
+The field data circuit breaker estimates the heap memory required to load a
+field into the <<modules-fielddata,field data cache>>. If loading the field would
+cause the cache to exceed a predefined memory limit, the circuit breaker stops the
+operation and returns an error.
 
 [[fielddata-circuit-breaker-limit]]
 // tag::fielddata-circuit-breaker-limit-tag[]
diff --git a/docs/reference/modules/indices/fielddata.asciidoc b/docs/reference/modules/indices/fielddata.asciidoc
index d3fae03e3aab5..8795f41d07144 100644
--- a/docs/reference/modules/indices/fielddata.asciidoc
+++ b/docs/reference/modules/indices/fielddata.asciidoc
@@ -1,47 +1,30 @@
 [[modules-fielddata]]
 === Field data cache settings
 
-The field data cache is an in-memory data structure, built on demand
-based on the type of query that is being run. It contains both
-<<fielddata, `fielddata`>> and <<eager-global-ordinals, global ordinals>>,
-which serve similar functions for different types of queries.
-The cache uses the JVM heap, so it is important to monitor its use
-and not to overload your cluster.
-
-Other than fields where the cache is built ahead of time, it is populated as needed
-on request. This includes:
-
-* Certain bucket aggregations on `keyword`, `ip`, and `flattened` fields. This
-includes `terms` aggregations, as well as `composite`, `diversified_sampler`,
-and `significant_terms`.
-* Bucket aggregations on `text` fields that have <<fielddata, `fielddata`>>
- enabled.
-* Bucket aggregations on the <<mapping-id-field, `_id` field>> when it is enabled for aggregation
-* Operations on parent and child documents from a `join` field, including
-`has_child` queries and `parent` aggregations.
+The field data cache contains <<fielddata-mapping-param, field data>> and <<eager-global-ordinals, global ordinals>>,
+which are both used to support aggregations on certain field types.
+Since these are on-heap data structures, it is important to monitor the cache's use.
 
 [discrete]
 [[fielddata-sizing]]
 ==== Cache size
 
 The entries in the cache are expensive to build, so the default behavior is
-to keep the cache loaded in memory
+to keep the cache loaded in memory. The default cache size is unlimited,
+causing the cache to grow until it reaches the limit set by the <<fielddata-circuit-breaker, field data circuit breaker>>. This behavior can be configured.
 
-The default cache size is unlimited, causing the cache to grow until it
-reaches the limit set by the <<fielddata-circuit-breaker, field data circuit breaker>>.
-It is recommended to set a cache size limit that is smaller than the circuit breaker
-value. Setting the limit will cause the cache to behave as a least-recently-updated
-cache, only keeping the most recently requested field data.
+If the cache size limit is set, the cache will begin clearing the least-recently-updated
+entries in the cache. This setting can automatically avoid the circuit breaker limit,
+at the cost of rebuilding the cache as needed.
 
-If the field data circuit breaker is reached, preventing further requests, the
-best option is to manually <<indices-clearcache, clear the cache>>. This will
-allow requests to re-build the cache setting.
+If the circuit breaker limit is reached, further requests that increase the cache
+size will be prevented. In this case you shoul manually <<indices-clearcache, clear the cache>>.
 
 `indices.fielddata.cache.size`::
 (<<static-cluster-setting,Static>>)
-The max size of the field data cache, eg `30%` of node heap space, or an
-absolute value, eg `12GB`. Defaults to unbounded.  Also see
-<<fielddata-circuit-breaker>>.
+The max size of the field data cache, eg `38%` of node heap space, or an
+absolute value, eg `12GB`. Defaults to unbounded. Should be set smaller than the
+<<fielddata-circuit-breaker>>, if set.
 
 [discrete]
 [[fielddata-monitoring]]
@@ -49,4 +32,4 @@ absolute value, eg `12GB`. Defaults to unbounded.  Also see
 
 You can monitor memory usage for field data as well as the field data circuit
 breaker using
-<<cluster-nodes-stats,Nodes Stats API>> or the <<cat-fielddata, _cat/fielddata API>>
+the <<cluster-nodes-stats,nodes stats API>> or the <<cat-fielddata,cat fielddata API>>.

From b83b08036fc0709d1ef2793aa63483cea7e4d6aa Mon Sep 17 00:00:00 2001
From: Julie Tibshirani <julie.tibshirani@elastic.co>
Date: Fri, 20 Nov 2020 11:05:05 -0800
Subject: [PATCH 6/6] Address review comments.

---
 docs/reference/how-to/search-speed.asciidoc   | 18 +++++++++---------
 .../mapping/fields/id-field.asciidoc          | 19 +++++++++++--------
 .../params/eager-global-ordinals.asciidoc     |  6 +++---
 .../modules/indices/fielddata.asciidoc        |  6 +++---
 4 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc
index 2503c02cc1e3b..e51c7fa2b7821 100644
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@@ -308,8 +308,8 @@ optimize the performance of aggregations. They are calculated lazily and stored
 the JVM heap as part of the <<modules-fielddata, field data cache>>. For fields
 that are heavily used for bucketing aggregations, you can tell {es} to construct
 and cache the global ordinals before requests are received. This should be done
-carefully because it will increase heap usage and delay indexing until the global
-ordinals are constructed. This can be set dynamically on an existing mapping by
+carefully because it will increase heap usage and can make <<indices-refresh, refreshes>>
+take longer. The option can be updated dynamically on an existing mapping by
 setting the <<eager-global-ordinals, eager global ordinals>> mapping parameter:
 
 [source,console]
@@ -393,19 +393,19 @@ right number of replicas for you is
 
 === Tune your queries with the Profile API
 
-You can also analyse how expensive each component of your queries and 
-aggregations are using the {ref}/search-profile.html[Profile API]. This might 
-allow you to tune your queries to be less expensive, resulting in a positive 
-performance result and reduced load. Also note that Profile API payloads can be 
-easily visualised for better readability in the 
-{kibana-ref}/xpack-profiler.html[Search Profiler], which is a Kibana dev tools 
+You can also analyse how expensive each component of your queries and
+aggregations are using the {ref}/search-profile.html[Profile API]. This might
+allow you to tune your queries to be less expensive, resulting in a positive
+performance result and reduced load. Also note that Profile API payloads can be
+easily visualised for better readability in the
+{kibana-ref}/xpack-profiler.html[Search Profiler], which is a Kibana dev tools
 UI available in all X-Pack licenses, including the free X-Pack Basic license.
 
 Some caveats to the Profile API are that:
 
  - the Profile API as a debugging tool adds significant overhead to search execution and can also have a very verbose output
  - given the added overhead, the resulting took times are not reliable indicators of actual took time, but can be used comparatively between clauses for relative timing differences
- - the Profile API is best for exploring possible reasons behind the most costly clauses of a query but isn't intended for accurately measuring absolute timings of each clause 
+ - the Profile API is best for exploring possible reasons behind the most costly clauses of a query but isn't intended for accurately measuring absolute timings of each clause
 
 [[faster-phrase-queries]]
 === Faster phrase queries with `index_phrases`
diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc
index 16ee7d8619408..1e963dd6de7d7 100644
--- a/docs/reference/mapping/fields/id-field.asciidoc
+++ b/docs/reference/mapping/fields/id-field.asciidoc
@@ -5,12 +5,10 @@ Each document has an `_id` that uniquely identifies it, which is indexed
 so that documents can be looked up either with the <<docs-get,GET API>> or the
 <<query-dsl-ids-query,`ids` query>>. The `_id` can either be assigned at
 indexing time, or a unique `_id` can be generated by {es}. This field is not
-configurable.
+configurable in the mappings.
 
-The value of the `_id` field is only accessible in certain queries (`term`,
-`terms`, `match`, `query_string`, `simple_query_string`), but is restricted
-from use in aggregations, sorting, or scripting. For those use cases the
-`keyword` type is recommended.
+The value of the `_id` field is accessible in queries such as `term`,
+`terms`, `match`, and `query_string`.
 
 [source,console]
 --------------------------
@@ -20,16 +18,16 @@ PUT my-index-000001/_doc/1
   "text": "Document with ID 1"
 }
 
-POST my-index-000001/_doc/?refresh=true
+PUT my-index-000001/_doc/2?refresh=true
 {
-  "text": "Document with generated ID"
+  "text": "Document with ID 2"
 }
 
 GET my-index-000001/_search
 {
   "query": {
     "terms": {
-      "_id": [ "1", "AhcEj3UB1Y-S1MdSrUDG" ] <1>
+      "_id": [ "1", "2" ] <1>
     }
   }
 }
@@ -37,6 +35,11 @@ GET my-index-000001/_search
 
 <1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
 
+The `_id` field is restricted from use in aggregations, sorting, and scripting.
+In case sorting or aggregating on the `_id` field is required, it is advised to
+duplicate the content of the `_id` field into another field that has
+`doc_values` enabled.
+
 [NOTE]
 ==================================================
 `_id` is limited to 512 bytes in size and larger values will be rejected.
diff --git a/docs/reference/mapping/params/eager-global-ordinals.asciidoc b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
index c990abd5da9f4..76f2f41656469 100644
--- a/docs/reference/mapping/params/eager-global-ordinals.asciidoc
+++ b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
@@ -34,10 +34,10 @@ to be enabled.
 * Operations on parent and child documents from a `join` field, including
 `has_child` queries and `parent` aggregations.
 
-NOTE: The global ordinal mapping use heap memory as part of the
+NOTE: The global ordinal mapping uses heap memory as part of the
 <<modules-fielddata, field data cache>>. Aggregations on high cardinality fields
-can use a significant amount of heap memory, and could exceed the threshold
-of the <<fielddata-circuit-breaker, field data circuit breaker>>.
+can use a lot of memory and trigger the <<fielddata-circuit-breaker, field data
+circuit breaker>>.
 
 ==== Loading global ordinals
 
diff --git a/docs/reference/modules/indices/fielddata.asciidoc b/docs/reference/modules/indices/fielddata.asciidoc
index 8795f41d07144..1383bf74d6d4c 100644
--- a/docs/reference/modules/indices/fielddata.asciidoc
+++ b/docs/reference/modules/indices/fielddata.asciidoc
@@ -18,13 +18,13 @@ entries in the cache. This setting can automatically avoid the circuit breaker l
 at the cost of rebuilding the cache as needed.
 
 If the circuit breaker limit is reached, further requests that increase the cache
-size will be prevented. In this case you shoul manually <<indices-clearcache, clear the cache>>.
+size will be prevented. In this case you should manually <<indices-clearcache, clear the cache>>.
 
 `indices.fielddata.cache.size`::
 (<<static-cluster-setting,Static>>)
 The max size of the field data cache, eg `38%` of node heap space, or an
-absolute value, eg `12GB`. Defaults to unbounded. Should be set smaller than the
-<<fielddata-circuit-breaker>>, if set.
+absolute value, eg `12GB`. Defaults to unbounded. If you choose to set it,
+it should be smaller than <<fielddata-circuit-breaker>> limit.
 
 [discrete]
 [[fielddata-monitoring]]