Skip to content

Commit

Permalink
Reflect latest changes in synthetic source documentation (elastic#109501
Browse files Browse the repository at this point in the history
)
  • Loading branch information
lkts committed Jul 4, 2024
1 parent 9e9033b commit 3939271
Show file tree
Hide file tree
Showing 4 changed files with 70 additions and 42 deletions.
14 changes: 14 additions & 0 deletions docs/changelog/109501.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
pr: 109501
summary: Reflect latest changes in synthetic source documentation
area: Mapping
type: enhancement
issues: []
highlight:
title: Synthetic `_source` improvements
body: |-
There are multiple improvements to synthetic `_source` functionality:
* Synthetic `_source` is now supported for all field types including `nested` and `object`. `object` fields are supported with `enabled` set to `false`.
* Synthetic `_source` can be enabled together with `ignore_malformed` and `ignore_above` parameters for all field types that support them.
notable: false
3 changes: 2 additions & 1 deletion docs/reference/data-streams/tsds.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,9 @@ shard segments by `_tsid` and `@timestamp`.
documents, the document `_id` is a hash of the document's dimensions and
`@timestamp`. A TSDS doesn't support custom document `_id` values.


* A TSDS uses <<synthetic-source,synthetic `_source`>>, and as a result is
subject to a number of <<synthetic-source-restrictions,restrictions>>.
subject to some <<synthetic-source-restrictions,restrictions>> and <<synthetic-source-modifications,modifications>> applied to the `_source` field.

NOTE: A time series index can contain fields other than dimensions or metrics.

Expand Down
12 changes: 6 additions & 6 deletions docs/reference/mapping/fields/source-field.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ at index time. The `_source` field itself is not indexed (and thus is not
searchable), but it is stored so that it can be returned when executing
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.

If disk usage is important to you then have a look at
<<synthetic-source,synthetic `_source`>> which shrinks disk usage at the cost of
only supporting a subset of mappings and slower fetches or (not recommended)
<<disable-source-field,disabling the `_source` field>> which also shrinks disk
usage but disables many features.
If disk usage is important to you, then consider the following options:

- Using <<synthetic-source,synthetic `_source`>>, which reconstructs source content at the time of retrieval instead of storing it on disk. This shrinks disk usage, at the cost of slower access to `_source` in <<docs-get,Get>> and <<search-search,Search>> queries.
- <<disable-source-field,Disabling the `_source` field completely>>. This shrinks disk
usage but disables features that rely on `_source`.

include::synthetic-source.asciidoc[]

Expand Down Expand Up @@ -43,7 +43,7 @@ available then a number of features are not supported:
* The <<docs-update,`update`>>, <<docs-update-by-query,`update_by_query`>>,
and <<docs-reindex,`reindex`>> APIs.
* In the {kib} link:{kibana-ref}/discover.html[Discover] application, field data will not be displayed.
* In the {kib} link:{kibana-ref}/discover.html[Discover] application, field data will not be displayed.
* On the fly <<highlighting,highlighting>>.
Expand Down
83 changes: 48 additions & 35 deletions docs/reference/mapping/fields/synthetic-source.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,45 +28,22 @@ PUT idx

While this on the fly reconstruction is *generally* slower than saving the source
documents verbatim and loading them at query time, it saves a lot of storage
space.
space. Additional latency can be avoided by not loading `_source` field in queries when it is not needed.

[[synthetic-source-fields]]
===== Supported fields
Synthetic `_source` is supported by all field types. Depending on implementation details, field types have different properties when used with synthetic `_source`.

<<synthetic-source-fields-native-list, Most field types>> construct synthetic `_source` using existing data, most commonly <<doc-values,`doc_values`>> and <<stored-fields, stored fields>>. For these field types, no additional space is needed to store the contents of `_source` field. Due to the storage layout of <<doc-values,`doc_values`>>, the generated `_source` field undergoes <<synthetic-source-modifications, modifications>> compared to original document.

For all other field types, the original value of the field is stored as is, in the same way as the `_source` field in non-synthetic mode. In this case there are no modifications and field data in `_source` is the same as in the original document. Similarly, malformed values of fields that use <<ignore-malformed,`ignore_malformed`>> or <<ignore-above,`ignore_above`>> need to be stored as is. This approach is less storage efficient since data needed for `_source` reconstruction is stored in addition to other data required to index the field (like `doc_values`).

[[synthetic-source-restrictions]]
===== Synthetic `_source` restrictions

There are a couple of restrictions to be aware of:
Synthetic `_source` cannot be used together with field mappings that use <<copy-to,`copy_to`>>.

* When you retrieve synthetic `_source` content it undergoes minor
<<synthetic-source-modifications,modifications>> compared to the original JSON.
* Synthetic `_source` can be used with indices that contain only these field
types:

** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
** <<binary-synthetic-source,`binary`>>
** <<boolean-synthetic-source,`boolean`>>
** <<numeric-synthetic-source,`byte`>>
** <<date-synthetic-source,`date`>>
** <<date-nanos-synthetic-source,`date_nanos`>>
** <<dense-vector-synthetic-source,`dense_vector`>>
** <<numeric-synthetic-source,`double`>>
** <<flattened-synthetic-source, `flattened`>>
** <<numeric-synthetic-source,`float`>>
** <<geo-point-synthetic-source,`geo_point`>>
** <<geo-shape-synthetic-source,`geo_shape`>>
** <<numeric-synthetic-source,`half_float`>>
** <<histogram-synthetic-source,`histogram`>>
** <<numeric-synthetic-source,`integer`>>
** <<ip-synthetic-source,`ip`>>
** <<keyword-synthetic-source,`keyword`>>
** <<numeric-synthetic-source,`long`>>
** <<range-synthetic-source,`range` types>>
** <<numeric-synthetic-source,`scaled_float`>>
** <<search-as-you-type-synthetic-source,`search_as_you_type`>>
** <<numeric-synthetic-source,`short`>>
** <<text-synthetic-source,`text`>>
** <<token-count-synthetic-source,`token_count`>>
** <<version-synthetic-source,`version`>>
** <<wildcard-synthetic-source,`wildcard`>>
Some field types have additional restrictions. These restrictions are documented in the **synthetic `_source`** section of the field type's <<mapping-types,documentation>>.

[[synthetic-source-modifications]]
===== Synthetic `_source` modifications
Expand Down Expand Up @@ -178,4 +155,40 @@ that ordering.

[[synthetic-source-modifications-ranges]]
====== Representation of ranges
Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>.
Range field values (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>.

[[synthetic-source-precision-loss-for-point-types]]
====== Reduced precision of `geo_point` values
Values of `geo_point` fields are represented in synthetic `_source` with reduced precision. See <<geo-point-synthetic-source, examples>>.


[[synthetic-source-fields-native-list]]
===== Field types that support synthetic source with no storage overhead
The following field types support synthetic source using data from <<doc-values,`doc_values`>> or <<stored-fields, stored fields>>, and require no additional storage space to construct the `_source` field.

NOTE: If you enable the <<ignore-malformed,`ignore_malformed`>> or <<ignore-above,`ignore_above`>> settings, then additional storage is required to store ignored field values for these types.

** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
** <<binary-synthetic-source,`binary`>>
** <<boolean-synthetic-source,`boolean`>>
** <<numeric-synthetic-source,`byte`>>
** <<date-synthetic-source,`date`>>
** <<date-nanos-synthetic-source,`date_nanos`>>
** <<dense-vector-synthetic-source,`dense_vector`>>
** <<numeric-synthetic-source,`double`>>
** <<flattened-synthetic-source, `flattened`>>
** <<numeric-synthetic-source,`float`>>
** <<geo-point-synthetic-source,`geo_point`>>
** <<numeric-synthetic-source,`half_float`>>
** <<histogram-synthetic-source,`histogram`>>
** <<numeric-synthetic-source,`integer`>>
** <<ip-synthetic-source,`ip`>>
** <<keyword-synthetic-source,`keyword`>>
** <<numeric-synthetic-source,`long`>>
** <<range-synthetic-source,`range` types>>
** <<numeric-synthetic-source,`scaled_float`>>
** <<numeric-synthetic-source,`short`>>
** <<text-synthetic-source,`text`>>
** <<version-synthetic-source,`version`>>
** <<wildcard-synthetic-source,`wildcard`>>

0 comments on commit 3939271

Please sign in to comment.