Introduce the new convention for multi-fields text indexing to the RE…

…ADME. (elastic#140) * Introduce the new convention for multi-fields text indexing to the README. * Be a little more explicit in the changelog for elastic#137
ruflin · Oct 24, 2018 · f9d5f01 · f9d5f01
1 parent 4ec8988
commit f9d5f01
Show file tree

Hide file tree

Showing 3 changed files with 70 additions and 48 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -19,6 +19,8 @@ All notable changes to this project will be documented in this file based on the
 * Remove `*.timezone.offset.sec` fields as too specific for ECS at the moment. #134
 * Make the following fields keyword: device.vendor, file.path, file.target_path, http.response.body, network.name, organization.name, url.href, url.path, url.query, user_agent.original
 * Rename `url.host.name` to `url.hostname` to better align with industry convention.
+* Make the following fields keyword: device.vendor, file.path, file.target_path, http.response.body, network.name, organization.name, url.href, url.path, url.query, user_agent.original. #137
+  * Only two fields using `text` indexing at this time are `message` and `error.message`.
 
 ### Bugfixes
 

diff --git a/README.md b/README.md
@@ -458,40 +458,50 @@ Contributions of additional uses cases on top of ECS are welcome.
 
 ### Multi-fields text indexing
 
-ElasticSearch can index text multiple ways:
+Elasticsearch can index text multiple ways:
 
-* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
+* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html)
+  indexing allows for full text search, or searching arbitrary words that
   are part of the field.
-* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
-  [exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
-  and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
+* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html)
+  indexing allows for much faster
+  [exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html),
+  [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
   and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
   (what Kibana visualizations are built on).
 
-In some cases, only one type of indexing makes sense for a field.
+By default, unless your index mapping or index template specifies otherwise
+(as the ECS index template does),
+Elasticsearch indexes text field as `text` at the canonical field name,
+and indexes a second time as `keyword`, nested in a multi-field.
 
-However there are cases where both types of indexing can be useful, and we want
-to index both ways.
-As an example, log messages can sometimes be short enough that it makes sense
-to sort them by frequency (that's an aggregation). They can also be long and
-varied enough that full text search can be useful on them.
+Default Elasticsearch convention:
 
-Whenever both types of indexing are helpful, we use multi-fields indexing. The
-convention used is the following:
+* Canonical field: `myfield` is `text`
+* Multi-field: `myfield.keyword` is `keyword`
 
-* `foo`: `text` indexing.
-  The top level of the field (its plain name) is used for full text search.
-* `foo.raw`: `keyword` indexing.
-  The nested field has suffix `.raw` and is what you will use for aggregations.
-  * Performance tip: when filtering your stream in Kibana (or elsewhere), if you
-    are filtering for an exact match or doing a prefix search,
-    both `text` and `keyword` field can be used, but doing so on the `keyword`
-    field (named `.raw`) will be much faster and less memory intensive.
+For monitoring use cases, `keyword` indexing is needed almost exclusively, with
+full text search on very few fields. Given this premise, ECS defaults
+all text indexing to `keyword` at the top level (with very few exceptions).
+Any use case that requires full text search indexing on additional fields
+can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html)
+for full text search. Doing so does not conflict with ECS,
+as the canonical field name will remain `keyword` indexed.
 
-**Keyword only fields**
+ECS multi-field convention for text:
 
-The fields that only make sense as type `keyword` are not named `foo.raw`, the
-plain field (`foo`) will be of type `keyword`, with no nested field.
+* Canonical field: `myfield` is `keyword`
+* Multi-field: `myfield.text` is `text`
+
+#### Exceptions
+
+The only exceptions to this convention are fields `message` and `error.message`,
+which are indexed for full text search only, with no multi-field.
+These two fields don't follow the new convention because they are deemed too big
+of a breaking change with these two widely used fields in Beats.
+
+Any future field that will be indexed for full text search in ECS will however
+follow the multi-field convention where `text` indexing is nested in the multi-field.
 
 ### IDs are keywords not integers
 

diff --git a/docs/implementing.md b/docs/implementing.md
@@ -26,40 +26,50 @@
 
 ### Multi-fields text indexing
 
-ElasticSearch can index text multiple ways:
+Elasticsearch can index text multiple ways:
 
-* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
+* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html)
+  indexing allows for full text search, or searching arbitrary words that
   are part of the field.
-* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
-  [exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
-  and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
+* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html)
+  indexing allows for much faster
+  [exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html),
+  [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
   and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
   (what Kibana visualizations are built on).
 
-In some cases, only one type of indexing makes sense for a field.
+By default, unless your index mapping or index template specifies otherwise
+(as the ECS index template does),
+Elasticsearch indexes text field as `text` at the canonical field name,
+and indexes a second time as `keyword`, nested in a multi-field.
 
-However there are cases where both types of indexing can be useful, and we want
-to index both ways.
-As an example, log messages can sometimes be short enough that it makes sense
-to sort them by frequency (that's an aggregation). They can also be long and
-varied enough that full text search can be useful on them.
+Default Elasticsearch convention:
 
-Whenever both types of indexing are helpful, we use multi-fields indexing. The
-convention used is the following:
+* Canonical field: `myfield` is `text`
+* Multi-field: `myfield.keyword` is `keyword`
 
-* `foo`: `text` indexing.
-  The top level of the field (its plain name) is used for full text search.
-* `foo.raw`: `keyword` indexing.
-  The nested field has suffix `.raw` and is what you will use for aggregations.
-  * Performance tip: when filtering your stream in Kibana (or elsewhere), if you
-    are filtering for an exact match or doing a prefix search,
-    both `text` and `keyword` field can be used, but doing so on the `keyword`
-    field (named `.raw`) will be much faster and less memory intensive.
+For monitoring use cases, `keyword` indexing is needed almost exclusively, with
+full text search on very few fields. Given this premise, ECS defaults
+all text indexing to `keyword` at the top level (with very few exceptions).
+Any use case that requires full text search indexing on additional fields
+can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html)
+for full text search. Doing so does not conflict with ECS,
+as the canonical field name will remain `keyword` indexed.
 
-**Keyword only fields**
+ECS multi-field convention for text:
 
-The fields that only make sense as type `keyword` are not named `foo.raw`, the
-plain field (`foo`) will be of type `keyword`, with no nested field.
+* Canonical field: `myfield` is `keyword`
+* Multi-field: `myfield.text` is `text`
+
+#### Exceptions
+
+The only exceptions to this convention are fields `message` and `error.message`,
+which are indexed for full text search only, with no multi-field.
+These two fields don't follow the new convention because they are deemed too big
+of a breaking change with these two widely used fields in Beats.
+
+Any future field that will be indexed for full text search in ECS will however
+follow the multi-field convention where `text` indexing is nested in the multi-field.
 
 ### IDs are keywords not integers