Skip to content

Commit

Permalink
ESQL: extend BUCKET with spans. Turn it into a grouping function (ela…
Browse files Browse the repository at this point in the history
…stic#107272)

This extends `BUCKET` function to accept a two-parameters-only
invocation: the first parameter remains as is, while the second is a
span. It can be a numeric (floating point) span, if the first argument
is numeric, or a date period or time duration, if the first argument is
a date.

Also, the function can now be invoked with the alias BIN.

Additionally, the function has been turned into a grouping-only function
and thus can only be used within a `STATS` command.
  • Loading branch information
bpintea authored Apr 16, 2024
1 parent 9626615 commit a2c2e8f
Show file tree
Hide file tree
Showing 22 changed files with 995 additions and 508 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/107272.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 107272
summary: "ESQL: extend BUCKET with spans"
area: ES|QL
type: enhancement
issues: []
6 changes: 3 additions & 3 deletions docs/reference/esql/esql-get-started.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -248,22 +248,22 @@ For example, to create hourly buckets for the data on October 23rd:

[source,esql]
----
include::{esql-specs}/date.csv-spec[tag=gs-bucket]
include::{esql-specs}/bucket.csv-spec[tag=gs-bucket]
----

Combine `BUCKET` with <<esql-stats-by>> to create a histogram. For example,
to count the number of events per hour:

[source,esql]
----
include::{esql-specs}/date.csv-spec[tag=gs-bucket-stats-by]
include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by]
----

Or the median duration per hour:

[source,esql]
----
include::{esql-specs}/date.csv-spec[tag=gs-bucket-stats-by-median]
include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by-median]
----

[discrete]
Expand Down
22 changes: 11 additions & 11 deletions docs/reference/esql/functions/bucket.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ in monthly buckets:

[source.merge.styled,esql]
----
include::{esql-specs}/date.csv-spec[tag=docsBucketMonth]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonth]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/date.csv-spec[tag=docsBucketMonth-result]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonth-result]
|===

The goal isn't to provide *exactly* the target number of buckets, it's to pick a
Expand All @@ -51,11 +51,11 @@ Combine `BUCKET` with

[source.merge.styled,esql]
----
include::{esql-specs}/date.csv-spec[tag=docsBucketMonthlyHistogram]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonthlyHistogram]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/date.csv-spec[tag=docsBucketMonthlyHistogram-result]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonthlyHistogram-result]
|===

NOTE: `BUCKET` does not create buckets that don't match any documents.
Expand All @@ -66,11 +66,11 @@ at most 100 buckets in a year results in weekly buckets:

[source.merge.styled,esql]
----
include::{esql-specs}/date.csv-spec[tag=docsBucketWeeklyHistogram]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogram]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/date.csv-spec[tag=docsBucketWeeklyHistogram-result]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogram-result]
|===

NOTE: `BUCKET` does not filter any rows. It only uses the provided range to
Expand All @@ -83,11 +83,11 @@ salary histogram:

[source.merge.styled,esql]
----
include::{esql-specs}/ints.csv-spec[tag=docsBucketNumeric]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumeric]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/ints.csv-spec[tag=docsBucketNumeric-result]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumeric-result]
|===

Unlike the earlier example that intentionally filters on a date range, you
Expand All @@ -102,17 +102,17 @@ per hour:

[source.styled,esql]
----
include::{esql-specs}/date.csv-spec[tag=docsBucketLast24hr]
include::{esql-specs}/bucket.csv-spec[tag=docsBucketLast24hr]
----

Create monthly buckets for the year 1985, and calculate the average salary by
hiring month:

[source.merge.styled,esql]
----
include::{esql-specs}/date.csv-spec[tag=bucket_in_agg]
include::{esql-specs}/bucket.csv-spec[tag=bucket_in_agg]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/date.csv-spec[tag=bucket_in_agg-result]
include::{esql-specs}/bucket.csv-spec[tag=bucket_in_agg-result]
|===
2 changes: 1 addition & 1 deletion docs/reference/esql/functions/description/bucket.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

*Description*

Creates human-friendly buckets and returns a datetime value for each row that corresponds to the resulting bucket the row falls into.
Creates groups of values - buckets - out of a datetime or numeric input. The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.
Loading

0 comments on commit a2c2e8f

Please sign in to comment.