Skip to content

Commit

Permalink
SQL: Implement FIRST/LAST aggregate functions (#37936)
Browse files Browse the repository at this point in the history
FIRST and LAST can be used with one argument and work similarly to MIN
and MAX but they are implemented using a Top Hits aggregation and
therefore can also operate on keyword fields. When a second argument is
provided then they return the first/last value of the first arg when its
values are ordered ascending/descending (respectively) by the values of
the second argument. Currently because of the usage of a Top Hits
aggregation FIRST and LAST cannot be used in the HAVING clause of a
GROUP BY query to filter on the results of the aggregation.

Closes: #35639
  • Loading branch information
matriv committed Jan 31, 2019
1 parent 93ac858 commit 0b794d4
Show file tree
Hide file tree
Showing 34 changed files with 1,196 additions and 98 deletions.
198 changes: 198 additions & 0 deletions docs/reference/sql/functions/aggs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,196 @@ Returns the total number of _distinct non-null_ values in input values.
include-tagged::{sql-specs}/docs.csv-spec[aggCountDistinct]
--------------------------------------------------

[[sql-functions-aggs-first]]
===== `FIRST/FIRST_VALUE`

.Synopsis:
[source, sql]
----------------------------------------------
FIRST(field_name<1>[, ordering_field_name]<2>)
----------------------------------------------

*Input*:

<1> target field for the aggregation
<2> optional field used for ordering

*Output*: same type as the input

.Description:

Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by
the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name`
column is used for the sorting. E.g.:

[cols="<,<"]
|===
s| a | b

| 100 | 1
| 200 | 1
| 1 | 2
| 2 | 2
| 10 | null
| 20 | null
| null | null
|===

[source, sql]
----------------------
SELECT FIRST(a) FROM t
----------------------

will result in:
[cols="<"]
|===
s| FIRST(a)
| 1
|===

and

[source, sql]
-------------------------
SELECT FIRST(a, b) FROM t
-------------------------

will result in:
[cols="<"]
|===
s| FIRST(a, b)
| 100
|===


["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArg]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArgAndGroupBy]
--------------------------------------------------------------------

["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgs]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
---------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgsAndGroupBy]
---------------------------------------------------------------------

`FIRST_VALUE` is a name alias and can be used instead of `FIRST`, e.g.:

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstValueWithTwoArgsAndGroupBy]
--------------------------------------------------------------------------

[NOTE]
`FIRST` cannot be used in a HAVING clause.
[NOTE]
`FIRST` cannot be used with columns of type <<text, `text`>> unless
the field is also <<before-enabling-fielddata,saved as a keyword>>.

[[sql-functions-aggs-last]]
===== `LAST/LAST_VALUE`

.Synopsis:
[source, sql]
--------------------------------------------------
LAST(field_name<1>[, ordering_field_name]<2>)
--------------------------------------------------

*Input*:

<1> target field for the aggregation
<2> optional field used for ordering

*Output*: same type as the input

.Description:

It's the inverse of <<sql-functions-aggs-first>>. Returns the last **non-NULL** value (if such exists) of the
`field_name`input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not
provided, only the `field_name` column is used for the sorting. E.g.:

[cols="<,<"]
|===
s| a | b

| 10 | 1
| 20 | 1
| 1 | 2
| 2 | 2
| 100 | null
| 200 | null
| null | null
|===

[source, sql]
------------------------
SELECT LAST(a) FROM t
------------------------

will result in:
[cols="<"]
|===
s| LAST(a)
| 200
|===

and

[source, sql]
------------------------
SELECT LAST(a, b) FROM t
------------------------

will result in:
[cols="<"]
|===
s| LAST(a, b)
| 2
|===


["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArg]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArgAndGroupBy]
-------------------------------------------------------------------

["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgs]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgsAndGroupBy]
--------------------------------------------------------------------

`LAST_VALUE` is a name alias and can be used instead of `LAST`, e.g.:

["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastValueWithTwoArgsAndGroupBy]
-------------------------------------------------------------------------

[NOTE]
`LAST` cannot be used in `HAVING` clause.
[NOTE]
`LAST` cannot be used with columns of type <<text, `text`>> unless
the field is also <<before-enabling-fielddata,`saved as a keyword`>>.

[[sql-functions-aggs-max]]
===== `MAX`

Expand All @@ -137,6 +327,10 @@ Returns the maximum value across input values in the field `field_name`.
include-tagged::{sql-specs}/docs.csv-spec[aggMax]
--------------------------------------------------

[NOTE]
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause.

[[sql-functions-aggs-min]]
===== `MIN`

Expand All @@ -161,6 +355,10 @@ Returns the minimum value across input values in the field `field_name`.
include-tagged::{sql-specs}/docs.csv-spec[aggMin]
--------------------------------------------------

[NOTE]
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-first>> and therefore, it cannot be used in `HAVING` clause.

[[sql-functions-aggs-sum]]
===== `SUM`

Expand Down
7 changes: 7 additions & 0 deletions docs/reference/sql/limitations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,10 @@ include-tagged::{sql-specs}/docs.csv-spec[limitationSubSelectRewritten]

But, if the sub-select would include a `GROUP BY` or `HAVING` or the enclosing `SELECT` would be more complex than `SELECT X
FROM (SELECT ...) WHERE [simple_condition]`, this is currently **un-supported**.

[float]
=== Use <<sql-functions-aggs-first, `FIRST`>>/<<sql-functions-aggs-last,`LAST`>> aggregation functions in `HAVING` clause

Using `FIRST` and `LAST` in the `HAVING` clause is not supported. The same applies to
<<sql-functions-aggs-min,`MIN`>> and <<sql-functions-aggs-max,`MAX`>> when their target column
is of type <<keyword, `keyword`>> as they are internally translated to `FIRST` and `LAST`.
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,6 @@ public void testThatRepositoryRecoversEmptyIndexBasedOnLeaderSettings() throws I
assertNotEquals(leaderMetadata.getIndexUUID(), followerMetadata.getIndexUUID());
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/38100")
public void testDocsAreRecovered() throws Exception {
String leaderClusterRepoName = CcrRepository.NAME_PREFIX + "leader_cluster";
String leaderIndex = "index1";
Expand Down Expand Up @@ -316,7 +315,6 @@ public void testRateLimitingIsEmployed() throws Exception {
}
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/38027")
public void testIndividualActionsTimeout() throws Exception {
ClusterUpdateSettingsRequest settingsRequest = new ClusterUpdateSettingsRequest();
TimeValue timeValue = TimeValue.timeValueMillis(100);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ public void testShowFunctions() throws IOException {
assertThat(readLine(), containsString(HEADER_SEPARATOR));
assertThat(readLine(), RegexMatcher.matches("\\s*AVG\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*COUNT\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*FIRST\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*FIRST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*MAX\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*MIN\\s*\\|\\s*AGGREGATE\\s*"));
String line = readLine();
Expand Down Expand Up @@ -58,6 +62,8 @@ public void testShowFunctions() throws IOException {
public void testShowFunctionsLikePrefix() throws IOException {
assertThat(command("SHOW FUNCTIONS LIKE 'L%'"), RegexMatcher.matches("\\s*name\\s*\\|\\s*type\\s*"));
assertThat(readLine(), containsString(HEADER_SEPARATOR));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LEAST\\s*\\|\\s*CONDITIONAL\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LOG\\s*\\|\\s*SCALAR\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LOG10\\s*\\|\\s*SCALAR\\s*"));
Expand Down
73 changes: 73 additions & 0 deletions x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -373,3 +373,76 @@ SELECT COUNT(ALL last_name)=COUNT(ALL first_name) AS areEqual, COUNT(ALL first_n
---------------+---------------+---------------
false |90 |100
;

topHitsWithOneArgAndGroupBy
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name) as first, LAST(first_name) as last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
null | Berni | Patricio
F | Alejandro | Xinglin
M | Amabile | Zvonko
;

topHitsWithTwoArgsAndGroupBy
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) as first, LAST(first_name, birth_date) as last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
null | Lillian | Eberhardt
F | Sumant | Valdiodio
M | Remzi | Hilari
;

topHitsWithTwoArgsAndGroupByWithNullsOnTargetField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10025 AND 10035 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | null | Divier
M | null | Domenick
;

topHitsWithTwoArgsAndGroupByWithNullsOnSortingField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10047 AND 10052 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | Basil | Basil
M | Hidefumi | Heping
;

topHitsWithTwoArgsAndGroupByWithNullsOnTargetAndSortingField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10037 AND 10052 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+-------------+-----------------
F | Basil | Weiyi
M | Hidefumi | null
;

topHitsWithTwoArgsAndGroupByWithAllNullsOnTargetField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10030 AND 10037 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | null | null
M | null | null
;

topHitsOnDatetime
schema::gender:s|first:i|last:i
SELECT gender, month(first(birth_date, languages)) first, month(last(birth_date, languages)) last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
null | 1 | 10
F | 4 | 6
M | 1 | 4
;
8 changes: 6 additions & 2 deletions x-pack/plugin/sql/qa/src/main/resources/command.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ SHOW FUNCTIONS;

name:s | type:s
AVG |AGGREGATE
COUNT |AGGREGATE
MAX |AGGREGATE
COUNT |AGGREGATE
FIRST |AGGREGATE
FIRST_VALUE |AGGREGATE
LAST |AGGREGATE
LAST_VALUE |AGGREGATE
MAX |AGGREGATE
MIN |AGGREGATE
SUM |AGGREGATE
KURTOSIS |AGGREGATE
Expand Down
Loading

0 comments on commit 0b794d4

Please sign in to comment.