Integrate Elasticsearch Query Language, ES|QL #98309

ChrisHegarty · 2023-08-09T09:16:44Z

Integrate the Elasticsearch Query Language, ES|QL.

This Pull Request offers a consolidated view over the commits of the ES|QL feature branch, allowing to review and test them altogether, and in one place.

Upon integration into main this PR will merge the commits as-is (not squashed). This will effectively preserve the history of the ES|QL code.

In SQL `AVG(foo)` is `null` if there are no value for `foo`. Same for `MIN(foo)` and `MAX(foo)`. In fact, the only functions that don't return `null` on empty inputs seem to be `COUNT` and `COUNT(DISTINCT`. This flips our non-grouping aggs to have the same behavior because it's both more expected and fits better with other things we're building. This *is* different from Elasticsearch's aggs. But it's different in a good way. It also lines up more closely with the way that our grouping aggs work. This also revives the broken `AggregatorBenchmark` so that I could get performance figures for this change. And it's within the margin of error: ``` (blockType) (grouping) (op) Mode Cnt Before After Units vector_longs none sum avgt 7 0.440 ± 0.017 0.397 ± 0.003 ns/op half_null_longs none sum avgt 7 5.785 ± 0.022 5.861 ± 0.134 ns/op ``` I expected a small slowdown on the `half_null_longs` line and see it, but is within the margin of error. Either way, that's not the line that's nearly as optimized. We'll loop back around to it eventually. Closes ESQL-1297

🤖 ESQL: Merge upstream

This rigs the exceptions caught by `warnExceptions` in `Evaluator` to emit warnings through Elasticsearch's warnings system, the same way that the conversion functions work. Closes ESQL-1211

Convert Avg into a SurrogateExpression and introduce dedicated rule for handling surrogate AggregateFunction Remove Avg implementation Use sum instead of avg in some planning test Add dataType case for Div operator Relates ESQL-747

Euler's number.

This enables the `min` and `max` aggs on `date` fields without enabling any of the other numeric aggregates on `date`s. It also adds a fairly paranoid test that `sum` is not enabled on `date`s because that doesn't make a whole lot of sense. Closes ESQL-1247

🤖 ESQL: Merge upstream

This PR adds a new method that allows appending a block to a builder at a single position. This method is required to support multi-values in the enrich lookup. Ideally, the new method should be unified with the `copyFrom` method, but I will address it as a follow-up to reduce the complexity of this PR. Closes ESQL-1280

🤖 ESQL: Merge upstream

Co-authored-by: Abdon Pijpelink <[email protected]>

🤖 ESQL: Merge upstream

This adds an explicit test that makes sure that grouping aggs (`STATS FOO(x) BY y`) *mostly* return `null` if they receive only null values. We have code for this scattered around the grouping aggs but we were only testing it *sometimes*. This tests it all the time. Also! The behavior wasn't *quite* consistent. `COUNT` and `COUNT(DISTINCT` style aggs should return `0` but they didn't all do that. And non-`COUNT` style aggs should return `null` and they all did that. Exception `SUM` on doubles.

🤖 ESQL: Merge upstream

returns current datetime

This implements the `MV_DEDUPE` function that removes duplicates from multivalues fields. It wasn't strictly in our list of things we need in the first release, but I'm grabbing this now because I realized I needed very similar infrastructure when I was trying to build grouping by multivalued fields. In fact, I realized that I could use our stringtemplate code generation to generate most of the complex parts. This generates the actual body of `MV_DEDUPE`'s implementation and the body of the `Block` accepting `BlockHash` implementations. It'll be useful in the final step for grouping by multivalued fields. I also got pretty curious about whether the `O(n^2)` or `O(n*log(n))` algorithm for deduplication is faster. I'd been assuming that for all reasonable sized inputs the `O(n^2)` bubble sort looking selection algorithm was faster. So I measured it. And it's mostly true - even for `BytesRef` if you have a dozen entries the selection algorithm is faster. Lower overhead and stuff. Anyway, to measure it I had to implement the copy-and-sort `O(n*log(n))` algorithm. So while I was there I plugged it in and selected it in cases where the number of inputs is large and the selection alogorithm is likely to be slower.

[DOCS] Clarify the field order for RENAME

🤖 ESQL: Merge upstream

What happened?

🤖 ESQL: Merge upstream

elasticsearchmachine · 2023-08-15T10:15:30Z

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine · 2023-08-15T10:15:31Z

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

This adds support for null group keys in all groupings that are backed by the PackedValuesBlockHash. This means the only groupings that don't support null keys at the moment are long/long pairs, and long/bytes_ref pairs. Those are coming next.

When multivalued fields are loaded from lucene they are in sorted order but we weren't taking advantage of that fact. Now we are! It's much faster, even for fast operations like `mv_min` ``` (operation) Mode Cnt Score Error Units mv_min avgt 7 3.820 ± 0.070 ns/op mv_min_ascending avgt 7 1.979 ± 0.130 ns/op ``` We still have code to run in non-sorted mode because conversion functions and a few other things don't load in sorted order. I've also ported expanded the parameterized tests for the `MV_` functions because, well, I needed to expand them at least a little to test this change. And I just kept going and improved as many tests as I could.

elasticsearchmachine · 2023-08-16T08:01:41Z

Hi @ChrisHegarty, I've created a changelog YAML for you.

…e (again)

This commit adds a new TransportVersion for EsqlFeatureSetUsage.

* Sqrt function for ESQL Introduces a unary scalar function for square root, which is a thin wrapper over the Java.Math implementation. * Fix area for ESQL integration changelog. * Restore changelog. * Restore area in changelog.

Enable unary arithmetic negations in expressions, like eval x = -y. Support integer, long and double arguments. Changes: * Add mapper for Neg expressions. * Add tests. * Disallow negating unsigned longs during query verification.

…cates (#98412)

Remove unused aggregations or eval fields that are not needed in the final result. For example the query: from employees | stats c = count(salary), min = min(salary) | eval x = emp_no | keep c is now optimized as: from employees | stats c = count(salary) | keep c since neither the min or nor the x fields are actually needed.

ChrisHegarty · 2023-08-17T12:59:33Z

@elasticsearchmachine rerun elasticsearch-ci/bwc

elasticsearchmachine · 2023-08-17T17:36:02Z

@ChrisHegarty according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:

The PR is labelled release highlight but the changelog has no highlight section

nik9000 and others added 30 commits June 21, 2023 11:00

Merge pull request ESQL-1301 from elastic/main

da1d54b

🤖 ESQL: Merge upstream

Merge pull request ESQL-1305 from elastic/main

50a66f0

🤖 ESQL: Merge upstream

Emit warnings from bad date_parse (ESQL-1303)

1a1cdcb

This rigs the exceptions caught by `warnExceptions` in `Evaluator` to emit warnings through Elasticsearch's warnings system, the same way that the conversion functions work. Closes ESQL-1211

Introduce SurrogateExpression (ESQL-1285)

fa20e28

Convert Avg into a SurrogateExpression and introduce dedicated rule for handling surrogate AggregateFunction Remove Avg implementation Use sum instead of avg in some planning test Add dataType case for Div operator Relates ESQL-747

Create e() function (ESQL-1304)

35fddc2

Euler's number.

Merge pull request ESQL-1309 from elastic/main

7b664ea

🤖 ESQL: Merge upstream

Merge pull request ESQL-1310 from elastic/main

935e377

🤖 ESQL: Merge upstream

Merge pull request ESQL-1312 from elastic/main

edfeae2

🤖 ESQL: Merge upstream

Add docs for ENRICH command (ESQL-1313)

79596cc

Co-authored-by: Abdon Pijpelink <[email protected]>

Make Median a surrogate expression (ESQL-1307)

dad814f

Add query parameters to ESQL (ESQL-1308)

fe8a6c4

Merge pull request ESQL-1314 from elastic/main

5e09373

🤖 ESQL: Merge upstream

Merge pull request ESQL-1318 from elastic/main

15bb6f3

🤖 ESQL: Merge upstream

Merge pull request ESQL-1319 from elastic/main

4030a5e

🤖 ESQL: Merge upstream

Merge pull request ESQL-1320 from elastic/main

06ad8ec

🤖 ESQL: Merge upstream

[DOCS] Clarify the order for RENAME

f5c590b

Remove unnecessary space

73147db

Merge pull request ESQL-1323 from elastic/main

a4c4f77

🤖 ESQL: Merge upstream

Merge pull request ESQL-1325 from elastic/main

04f6b4e

🤖 ESQL: Merge upstream

Pick changes upstream

33ad0e1

Implement now() function (ESQL-1172)

c2c0b0f

returns current datetime

Merge pull request ESQL-1322 from abdonpijpelink/rename-clarify-order

ce64080

[DOCS] Clarify the field order for RENAME

Merge pull request ESQL-1330 from elastic/main

ceec5ab

🤖 ESQL: Merge upstream

Fix compilation

dfb8188

What happened?

Merge pull request ESQL-1332 from elastic/main

8cb3cb8

🤖 ESQL: Merge upstream

ChrisHegarty added v8.11.0 and removed WIP labels Aug 15, 2023

nik9000 and others added 3 commits August 15, 2023 16:52

Merge upstream

1071740

ChrisHegarty added the >feature label Aug 16, 2023

ChrisHegarty and others added 14 commits August 16, 2023 09:01

Update docs/changelog/98309.yaml

2f016e5

Post-merge fix - update TransportResponseHandler.Empty::handleResponse

d03db64

Merge upstream

798ccf9

Fix changelog area

2e9ea71

Post-merge fix - update TransportResponseHandler.Empty::handleRespons…

062cf0e

…e (again)

Declare and assign TransportVersion for EsqlFeatureSetUsage (#98525)

f462e60

This commit adds a new TransportVersion for EsqlFeatureSetUsage.

Merge upstream

0a5e0d3

Sqrt function for ESQL (#98449)

b498ce9

* Sqrt function for ESQL Introduces a unary scalar function for square root, which is a thin wrapper over the Java.Math implementation. * Fix area for ESQL integration changelog. * Restore changelog. * Restore area in changelog.

Merge upstream

109c2b6

Merge upstream

d7c0f62

ESQL: replace the is_null function with IS NULL and IS NOT NULL predi…

014bd33

…cates (#98412)

Merge upstream

21dcb75

ChrisHegarty merged commit de87554 into main Aug 17, 2023

ChrisHegarty deleted the feature/esql branch August 17, 2023 13:05

ChrisHegarty restored the feature/esql branch August 17, 2023 13:15

costin added the release highlight label Aug 17, 2023

mattc58 removed the release highlight label Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Elasticsearch Query Language, ES|QL #98309

Integrate Elasticsearch Query Language, ES|QL #98309

ChrisHegarty commented Aug 9, 2023 •

edited

Loading

elasticsearchmachine commented Aug 15, 2023

elasticsearchmachine commented Aug 15, 2023

elasticsearchmachine commented Aug 16, 2023

ChrisHegarty commented Aug 17, 2023

elasticsearchmachine commented Aug 17, 2023

Integrate Elasticsearch Query Language, ES|QL #98309

Integrate Elasticsearch Query Language, ES|QL #98309

Conversation

ChrisHegarty commented Aug 9, 2023 • edited Loading

elasticsearchmachine commented Aug 15, 2023

elasticsearchmachine commented Aug 15, 2023

elasticsearchmachine commented Aug 16, 2023

ChrisHegarty commented Aug 17, 2023

elasticsearchmachine commented Aug 17, 2023

ChrisHegarty commented Aug 9, 2023 •

edited

Loading