Skip to content

Commit

Permalink
Add docs for aggs filtering (#116681) (#117335)
Browse files Browse the repository at this point in the history
Add documentation for aggs filtering (the WHERE in STATS command).

Fixes: #115083
  • Loading branch information
bpintea authored Nov 22, 2024
1 parent efaf72d commit c02c55e
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 10 deletions.
49 changes: 39 additions & 10 deletions docs/reference/esql/processing-commands/stats.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
[discrete]
[[esql-stats-by]]
=== `STATS ... BY`
=== `STATS`

The `STATS ... BY` processing command groups rows according to a common value
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows.

**Syntax**

[source,esql]
----
STATS [column1 =] expression1[, ..., [columnN =] expressionN]
[BY grouping_expression1[, ..., grouping_expressionN]]
STATS [column1 =] expression1 [WHERE boolean_expression1][,
...,
[columnN =] expressionN [WHERE boolean_expressionN]]
[BY grouping_expression1[, ..., grouping_expressionN]]
----

*Parameters*
Expand All @@ -28,14 +30,18 @@ An expression that computes an aggregated value.
An expression that outputs the values to group by.
If its name coincides with one of the computed columns, that column will be ignored.

`boolean_expressionX`::
The condition that must be met for a row to be included in the evaluation of `expressionX`.

NOTE: Individual `null` values are skipped when computing aggregations.

*Description*

The `STATS ... BY` processing command groups rows according to a common value
and calculate one or more aggregated values over the grouped rows. If `BY` is
omitted, the output table contains exactly one row with the aggregations applied
over the entire dataset.
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows. For the
calculation of each aggregated value, the rows in a group can be filtered with
`WHERE`. If `BY` is omitted, the output table contains exactly one row with
the aggregations applied over the entire dataset.

The following <<esql-agg-functions,aggregation functions>> are supported:

Expand Down Expand Up @@ -90,6 +96,29 @@ include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
|===

To filter the rows that go into an aggregation, use the `WHERE` clause:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=aggFiltering]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=aggFiltering-result]
|===

The aggregations can be mixed, with and without a filter and grouping is
optional as well:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup-result]
|===

[[esql-stats-mv-group]]
If the grouping key is multivalued then the input row is in all groups:

Expand All @@ -109,7 +138,7 @@ It's also possible to group by multiple values:
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
----

If the all grouping keys are multivalued then the input row is in all groups:
If all the grouping keys are multivalued then the input row is in all groups:

[source.merge.styled,esql]
----
Expand All @@ -121,7 +150,7 @@ include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
|===

Both the aggregating functions and the grouping expressions accept other
functions. This is useful for using `STATS...BY` on multivalue columns.
functions. This is useful for using `STATS` on multivalue columns.
For example, to calculate the average salary change, you can use `MV_AVG` to
first average the multiple values per employee, and use the result with the
`AVG` function:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2348,6 +2348,42 @@ v:integer | job_positions:keyword
10094 | Accountant
;

docsStatsWithSimpleFiltering
required_capability: per_agg_filtering
// tag::aggFiltering[]
FROM employees
| STATS avg50s = AVG(salary)::LONG WHERE birth_date < "1960-01-01",
avg60s = AVG(salary)::LONG WHERE birth_date >= "1960-01-01"
BY gender
| SORT gender
// end::aggFiltering[]
| WHERE gender IS NOT NULL
;

// tag::aggFiltering-result[]
avg50s:long |avg60s:long |gender:keyword
55462 |46637 |F
48279 |44879 |M
// end::aggFiltering-result[]
;

docsStatsWithFilteringNoGroups
required_capability: per_agg_filtering
// tag::aggFilteringNoGroup[]
FROM employees
| EVAL Ks = salary / 1000 // thousands
| STATS under_40K = COUNT(*) WHERE Ks < 40,
inbetween = COUNT(*) WHERE 40 <= Ks AND Ks < 60,
over_60K = COUNT(*) WHERE 60 <= Ks,
total = COUNT(*)
// end::aggFilteringNoGroup[]
;

// tag::aggFilteringNoGroup-result[]
under_40K:long |inbetween:long |over_60K:long |total:long
36 |39 |25 |100
// end::aggFilteringNoGroup-result[]
;

statsWithFiltering
required_capability: per_agg_filtering
Expand Down

0 comments on commit c02c55e

Please sign in to comment.