Skip to content

Commit

Permalink
Convert bucket aggs docs to runtime fields (backport #71202) (#71248)
Browse files Browse the repository at this point in the history
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
https://github.com/elastic/elasticsearch/blob/a92a647b9f17d1bddf5c707490a19482c273eda3/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java#L42

Relates to #69291

Co-authored-by: Adam Locke <[email protected]>
  • Loading branch information
nik9000 and Adam Locke authored Apr 2, 2021
1 parent 42cd343 commit 1b35100
Show file tree
Hide file tree
Showing 8 changed files with 282 additions and 213 deletions.
3 changes: 3 additions & 0 deletions docs/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,9 @@ Closure setupMyIndex = { String name, int count ->
type: keyword
message:
type: text
fields:
keyword:
type: keyword
user:
properties:
id:
Expand Down
91 changes: 39 additions & 52 deletions docs/reference/aggregations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -330,79 +330,66 @@ the aggregated field.
[[use-scripts-in-an-agg]]
=== Use scripts in an aggregation

Some aggregations support <<modules-scripting,scripts>>. You can
use a `script` to extract or generate values for the aggregation:
When a field doesn't exactly match the aggregation you need, you
should aggregate on a <<runtime,runtime field>>:

[source,console]
----
GET /my-index-000001/_search
GET /my-index-000001/_search?size=0
{
"runtime_mappings": {
"message.length": {
"type": "long",
"script": "emit(doc['message.keyword'].value.length())"
}
},
"aggs": {
"my-agg-name": {
"message_length": {
"histogram": {
"interval": 1000,
"script": {
"source": "doc['my-field'].value.length()"
}
"interval": 10,
"field": "message.length"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.request.method/]

If you also specify a `field`, the `script` modifies the field values used in
the aggregation. The following aggregation uses a script to modify `my-field`
values:

[source,console]
////
[source,console-result]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"histogram": {
"field": "my-field",
"interval": 1000,
"script": "_value / 1000"
}
"timed_out": false,
"took": "$body.took",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0,
"skipped": 0
},
"hits": "$body.hits",
"aggregations": {
"message_length": {
"buckets": [
{
"key": 30.0,
"doc_count": 5
}
]
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.response.bytes/]

Some aggregations only work on specific data types. Use the `value_type`
parameter to specify a data type for a script-generated value or an unmapped
field. `value_type` accepts the following values:
////

* `boolean`
* `date`
* `double`, used for all floating-point numbers
* `long`, used for all integers
* `ip`
* `string`
Scripts calculate field values dynamically, which adds a little
overhead to the aggregation. In addition to the time spent calculating,
some aggregations like <<search-aggregations-bucket-terms-aggregation,`terms`>>
and <<search-aggregations-bucket-filters-aggregation,`filters`>> can't use
some of their optimizations with runtime fields. In total, performance costs
for using a runtime field varies from aggregation to aggregation.

[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"histogram": {
"field": "my-field",
"interval": 1000,
"script": "_value / 1000",
"value_type": "long"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.response.bytes/]
// TODO when we have calculated fields we can link to them here.

[discrete]
[[agg-caches]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -251,12 +251,6 @@ instead of the usual 24 hours for other buckets. The same is true for shorter in
like e.g. 12h. Here, we will have only a 11h bucket on the morning of 27 March when the
DST shift happens.

==== Scripts

Like with the normal <<search-aggregations-bucket-datehistogram-aggregation, `date_histogram`>>, both document level
scripts and value level scripts are supported. This aggregation does not however, support the `min_doc_count`,
`extended_bounds`, `hard_bounds` and `order` parameters.

==== Minimum Interval parameter

The `minimum_interval` allows the caller to specify the minimum rounding interval that should be used.
Expand Down
115 changes: 92 additions & 23 deletions docs/reference/aggregations/bucket/composite-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ a composite bucket.

//////////////////////////
[source,js]
[source,console]
--------------------------------------------------
PUT /sales
{
Expand Down Expand Up @@ -72,7 +72,6 @@ POST /sales/_bulk?refresh
{"index":{"_id":4}}
{"product": "apocalypse now", "price": "10", "timestamp": "2017-05-11T08:35"}
-------------------------------------------------
// NOTCONSOLE
// TESTSETUP
//////////////////////////
Expand Down Expand Up @@ -121,7 +120,7 @@ The `sources` parameter can be any of the following types:
===== Terms

The `terms` value source is equivalent to a simple `terms` aggregation.
The values are extracted from a field or a script exactly like the `terms` aggregation.
The values are extracted from a field exactly like the `terms` aggregation.

Example:

Expand All @@ -142,33 +141,66 @@ GET /_search
}
--------------------------------------------------

Like the `terms` aggregation it is also possible to use a script to create the values for the composite buckets:
Like the `terms` aggregation, it's possible to use a
<<runtime,runtime field>> to create values for the composite buckets:

[source,console]
--------------------------------------------------
[source,console,id=composite-aggregation-terms-runtime-field-example]
----
GET /_search
{
"runtime_mappings": {
"day_of_week": {
"type": "keyword",
"script": """
emit(doc['timestamp'].value.dayOfWeekEnum
.getDisplayName(TextStyle.FULL, Locale.ROOT))
"""
}
},
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"product": {
"terms": {
"script": {
"source": "doc['product'].value",
"lang": "painless"
}
}
"dow": {
"terms": { "field": "day_of_week" }
}
}
]
}
}
}
}
--------------------------------------------------
----

////
[source,console-result]
----
{
"timed_out": false,
"took": "$body.took",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0,
"skipped": 0
},
"hits": "$body.hits",
"aggregations": {
"my_buckets": {
"after_key": { "dow": "Wednesday" },
"buckets": [
{ "key": { "dow": "Monday" }, "doc_count": 1 },
{ "key": { "dow": "Thursday" }, "doc_count": 1 },
{ "key": { "dow": "Tuesday" }, "doc_count": 2 },
{ "key": { "dow": "Wednesday" }, "doc_count": 1 }
]
}
}
}
----
////

[[_histogram]]
===== Histogram
Expand Down Expand Up @@ -197,25 +229,35 @@ GET /_search
}
--------------------------------------------------

The values are built from a numeric field or a script that return numerical values:
Like the `histogram` aggregation it's possible to use a
<<runtime,runtime field>> to create values for the composite buckets:

[source,console]
--------------------------------------------------
[source,console,id=composite-aggregation-histogram-runtime-field-example]
----
GET /_search
{
"runtime_mappings": {
"price.discounted": {
"type": "double",
"script": """
double price = doc['price'].value;
if (doc['product'].value == 'mad max') {
price *= 0.8;
}
emit(price);
"""
}
},
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"histo": {
"price": {
"histogram": {
"interval": 5,
"script": {
"source": "doc['price'].value",
"lang": "painless"
}
"field": "price.discounted"
}
}
}
Expand All @@ -224,7 +266,34 @@ GET /_search
}
}
}
--------------------------------------------------
----

////
[source,console-result]
----
{
"timed_out": false,
"took": "$body.took",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0,
"skipped": 0
},
"hits": "$body.hits",
"aggregations": {
"my_buckets": {
"after_key": { "price": 20.0 },
"buckets": [
{ "key": { "price": 10.0 }, "doc_count": 2 },
{ "key": { "price": 15.0 }, "doc_count": 1 },
{ "key": { "price": 20.0 }, "doc_count": 2 }
]
}
}
}
----
////

[[_date_histogram]]
===== Date histogram
Expand Down
Loading

0 comments on commit 1b35100

Please sign in to comment.