Union types documentation (elastic#110183)

* Union types documentation * Try remove asciidoc error * Another attempt * Using literal block * Nicer formatting * Remove partintro * Small refinements * Edits for clarity and style --------- Co-authored-by: Marci W <[email protected]>
craigtaverner · Jul 16, 2024 · aef0850 · aef0850
1 parent 016360a
commit aef0850
Show file tree

Hide file tree

Showing 4 changed files with 188 additions and 6 deletions.
diff --git a/docs/reference/esql/esql-limitations.asciidoc b/docs/reference/esql/esql-limitations.asciidoc
@@ -85,6 +85,11 @@ Some <<mapping-types,field types>> are not supported in all contexts:
 ** `cartesian_point`
 ** `cartesian_shape`
 
+In addition, when <<esql-multi-index, querying multiple indexes>>,
+it's possible for the same field to be mapped to multiple types.
+These fields cannot be directly used in queries or returned in results,
+unless they're <<esql-multi-index-union-types, explicitly converted to a single type>>.
+
 [discrete]
 [[esql-_source-availability]]
 === _source availability

diff --git a/docs/reference/esql/esql-multi-index.asciidoc b/docs/reference/esql/esql-multi-index.asciidoc
@@ -0,0 +1,175 @@
+[[esql-multi-index]]
+=== Using {esql} to query multiple indices
+++++
+<titleabbrev>Using {esql} to query multiple indices</titleabbrev>
+++++
+
+With {esql}, you can execute a single query across multiple indices, data streams, or aliases.
+To do so, use wildcards and date arithmetic. The following example uses a comma-separated list and a wildcard:
+
+[source,esql]
+----
+FROM employees-00001,other-employees-*
+----
+
+Use the format `<remote_cluster_name>:<target>` to <<esql-cross-clusters, query data streams and indices
+on remote clusters>>:
+
+[source,esql]
+----
+FROM cluster_one:employees-00001,cluster_two:other-employees-*
+----
+
+[discrete]
+[[esql-multi-index-invalid-mapping]]
+=== Field type mismatches
+
+When querying multiple indices, data streams, or aliases, you might find that the same field is mapped to multiple different types.
+For example, consider the two indices with the following field mappings:
+
+*index: events_ip*
+```
+{
+  "mappings": {
+    "properties": {
+      "@timestamp":     { "type": "date" },
+      "client_ip":      { "type": "ip" },
+      "event_duration": { "type": "long" },
+      "message":        { "type": "keyword" }
+    }
+  }
+}
+```
+
+*index: events_keyword*
+```
+{
+  "mappings": {
+    "properties": {
+      "@timestamp":     { "type": "date" },
+      "client_ip":      { "type": "keyword" },
+      "event_duration": { "type": "long" },
+      "message":        { "type": "keyword" }
+    }
+  }
+}
+```
+
+When you query each of these individually with a simple query like `FROM events_ip`, the results are provided with type-specific columns:
+
+[source.merge.styled,esql]
+----
+FROM events_ip
+| SORT @timestamp DESC
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+@timestamp:date | client_ip:ip | event_duration:long | message:keyword
+2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467 | Connected to 10.1.0.1
+2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755 | Connection error
+2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153 | Connection error
+|===
+
+Note how the `client_ip` column is correctly identified as type `ip`, and all values are displayed.
+However, if instead the query sources two conflicting indices with `FROM events_*`, the type of the `client_ip` column cannot be determined
+and is reported as `unsupported` with all values returned as `null`.
+
+[[query-unsupported]]
+[source.merge.styled,esql]
+----
+FROM events_*
+| SORT @timestamp DESC
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+@timestamp:date | client_ip:unsupported | event_duration:long | message:keyword
+2023-10-23T13:55:01.543Z | null  | 1756467 | Connected to 10.1.0.1
+2023-10-23T13:53:55.832Z | null  | 5033755 | Connection error
+2023-10-23T13:52:55.015Z | null  | 8268153 | Connection error
+2023-10-23T13:51:54.732Z | null  | 725448  | Connection error
+2023-10-23T13:33:34.937Z | null  | 1232382 | Disconnected
+2023-10-23T12:27:28.948Z | null  | 2764889 | Connected to 10.1.0.2
+2023-10-23T12:15:03.360Z | null  | 3450233 | Connected to 10.1.0.3
+|===
+
+In addition, if the query refers to this unsupported field directly, the query fails:
+
+[source.merge.styled,esql]
+----
+FROM events_*
+| KEEP @timestamp, client_ip, event_duration, message
+| SORT @timestamp DESC
+----
+
+[source,bash]
+----
+Cannot use field [client_ip] due to ambiguities being mapped as 
+[2] incompatible types:
+    [ip] in [events_ip],
+    [keyword] in [events_keyword]
+----
+
+[discrete]
+[[esql-multi-index-union-types]]
+=== Union types
+
+{esql} has a way to handle <<esql-multi-index-invalid-mapping, field type mismatches>>. When the same field is mapped to multiple types in multiple indices,
+the type of the field is understood to be a _union_ of the various types in the index mappings.
+As seen in the preceding examples, this _union type_ cannot be used in the results,
+and cannot be referred to by the query
+-- except when it's passed to a type conversion function that accepts all the types in the _union_ and converts the field
+to a single type. {esql} offers a suite of <<esql-type-conversion-functions,type conversion functions>> to achieve this. 
+
+In the above examples, the query can use a command like `EVAL client_ip = TO_IP(client_ip)` to resolve
+the union of `ip` and `keyword` to just `ip`.
+You can also use the type-conversion syntax `EVAL client_ip = client_ip::IP`.
+Alternatively, the query could use <<esql-to_string,`TO_STRING`>> to convert all supported types into `KEYWORD`.
+
+For example, the <<query-unsupported,query>> that returned `client_ip:unsupported` with `null` values can be improved using the `TO_IP` function or the equivalent `field::ip` syntax.
+These changes also resolve the error message.
+As long as the only reference to the original field is to pass it to a conversion function that resolves the type ambiguity, no error results.
+
+[source.merge.styled,esql]
+----
+FROM events_*
+| EVAL client_ip = TO_IP(client_ip)
+| KEEP @timestamp, client_ip, event_duration, message
+| SORT @timestamp DESC
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+@timestamp:date | client_ip:ip | event_duration:long | message:keyword
+2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467 | Connected to 10.1.0.1
+2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755 | Connection error
+2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153 | Connection error
+2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448  | Connection error
+2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382 | Disconnected
+2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2
+2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3
+|===
+
+[discrete]
+[[esql-multi-index-index-metadata]]
+=== Index metadata
+
+It can be helpful to know the particular index from which each row is sourced.
+To get this information, use the <<esql-metadata-fields,`METADATA`>> option on the <<esql-from,`FROM`>> command.
+
+[source.merge.styled,esql]
+----
+FROM events_* METADATA _index
+| EVAL client_ip = TO_IP(client_ip)
+| KEEP _index, @timestamp, client_ip, event_duration, message
+| SORT @timestamp DESC
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+_index:keyword | @timestamp:date | client_ip:ip | event_duration:long | message:keyword
+events_ip | 2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467 | Connected to 10.1.0.1
+events_ip | 2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755 | Connection error
+events_ip | 2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153 | Connection error
+events_keyword | 2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448  | Connection error
+events_keyword | 2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382 | Disconnected
+events_keyword | 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2
+events_keyword | 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3
+|===
diff --git a/docs/reference/esql/esql-using.asciidoc b/docs/reference/esql/esql-using.asciidoc
@@ -12,6 +12,9 @@ and set up alerts.
 Using {esql} in {elastic-sec} to investigate events in Timeline, create
 detection rules, and build {esql} queries using Elastic AI Assistant.
 
+<<esql-multi-index>>::
+Using {esql} to query multiple indexes and resolve field type mismatches.
+
 <<esql-cross-clusters>>::
 Using {esql} to query across multiple clusters.
 
@@ -21,5 +24,6 @@ Using the <<tasks,task management API>> to list and cancel {esql} queries.
 include::esql-rest.asciidoc[]
 include::esql-kibana.asciidoc[]
 include::esql-security-solution.asciidoc[]
+include::esql-multi-index.asciidoc[]
 include::esql-across-clusters.asciidoc[]
 include::task-management.asciidoc[]
diff --git a/docs/reference/esql/source-commands/from.asciidoc b/docs/reference/esql/source-commands/from.asciidoc
@@ -58,24 +58,22 @@ today's index:
 FROM <logs-{now/d}>
 ----
 
-Use comma-separated lists or wildcards to query multiple data streams, indices,
-or aliases:
+Use comma-separated lists or wildcards to <<esql-multi-index, query multiple data streams, indices,
+or aliases>>:
 
 [source,esql]
 ----
 FROM employees-00001,other-employees-*
 ----
 
-Use the format `<remote_cluster_name>:<target>` to query data streams and indices
-on remote clusters:
+Use the format `<remote_cluster_name>:<target>` to <<esql-cross-clusters, query data streams and indices
+on remote clusters>>:
 
 [source,esql]
 ----
 FROM cluster_one:employees-00001,cluster_two:other-employees-*
 ----
 
-See <<esql-cross-clusters, using {esql} across clusters>>.
-
 Use the optional `METADATA` directive to enable <<esql-metadata-fields,metadata fields>>:
 
 [source,esql]