Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Elasticsearch Query Language, ES|QL #98309

Merged
merged 1,354 commits into from
Aug 17, 2023
Merged

Integrate Elasticsearch Query Language, ES|QL #98309

merged 1,354 commits into from
Aug 17, 2023

Conversation

ChrisHegarty
Copy link
Contributor

@ChrisHegarty ChrisHegarty commented Aug 9, 2023

Integrate the Elasticsearch Query Language, ES|QL.

This Pull Request offers a consolidated view over the commits of the ES|QL feature branch, allowing to review and test them altogether, and in one place.

Upon integration into main this PR will merge the commits as-is (not squashed). This will effectively preserve the history of the ES|QL code.

nik9000 and others added 30 commits June 21, 2023 11:00
In SQL `AVG(foo)` is `null` if there are no value for `foo`. Same for
`MIN(foo)` and `MAX(foo)`. In fact, the only functions that don't return
`null` on empty inputs seem to be `COUNT` and `COUNT(DISTINCT`.

This flips our non-grouping aggs to have the same behavior because it's
both more expected and fits better with other things we're building.
This *is* different from Elasticsearch's aggs. But it's different in a
good way. It also lines up more closely with the way that our grouping
aggs work.

This also revives the broken `AggregatorBenchmark` so that I could get
performance figures for this change. And it's within the margin of
error:

```
    (blockType)  (grouping)  (op)  Mode  Cnt   Before         After         Units
   vector_longs        none   sum  avgt    7   0.440 ± 0.017  0.397 ± 0.003 ns/op
half_null_longs        none   sum  avgt    7   5.785 ± 0.022  5.861 ± 0.134 ns/op
```

I expected a small slowdown on the `half_null_longs` line and see it,
but is within the margin of error. Either way, that's not the line
that's nearly as optimized. We'll loop back around to it eventually.

Closes ESQL-1297
This rigs the exceptions caught by `warnExceptions` in `Evaluator` to
emit warnings through Elasticsearch's warnings system, the same way that
the conversion functions work.

Closes ESQL-1211
Convert Avg into a SurrogateExpression and introduce dedicated rule
 for handling surrogate AggregateFunction
Remove Avg implementation
Use sum instead of avg in some planning test
Add dataType case for Div operator

Relates ESQL-747
This enables the `min` and `max` aggs on `date` fields without enabling
any of the other numeric aggregates on `date`s. It also adds a fairly
paranoid test that `sum` is not enabled on `date`s because that doesn't
make a whole lot of sense.

Closes ESQL-1247
This PR adds a new method that allows appending a block to a builder at
a single position. This method is required to support multi-values in
the enrich lookup. Ideally, the new method should be unified with the
`copyFrom` method, but I will address it as a follow-up to reduce the
complexity of this PR.

Closes ESQL-1280
This adds an explicit test that makes sure that grouping aggs (`STATS
FOO(x) BY y`) *mostly* return `null` if they receive only null values.
We have code for this scattered around the grouping aggs but we were
only testing it *sometimes*. This tests it all the time. Also! The
behavior wasn't *quite* consistent. `COUNT` and `COUNT(DISTINCT` style
aggs should return `0` but they didn't all do that. And non-`COUNT`
style aggs should return `null` and they all did that. Exception `SUM`
on doubles.
returns current datetime
This implements the `MV_DEDUPE` function that removes duplicates from
multivalues fields. It wasn't strictly in our list of things we need in
the first release, but I'm grabbing this now because I realized I needed
very similar infrastructure when I was trying to build grouping by
multivalued fields. In fact, I realized that I could use our
stringtemplate code generation to generate most of the complex parts.
This generates the actual body of `MV_DEDUPE`'s implementation and the
body of the `Block` accepting `BlockHash` implementations. It'll be
useful in the final step for grouping by multivalued fields.

I also got pretty curious about whether the `O(n^2)` or `O(n*log(n))`
algorithm for deduplication is faster. I'd been assuming that for all
reasonable sized inputs the `O(n^2)` bubble sort looking selection
algorithm was faster. So I measured it. And it's mostly true - even for
`BytesRef` if you have a dozen entries the selection algorithm is
faster. Lower overhead and stuff. Anyway, to measure it I had to
implement the copy-and-sort `O(n*log(n))` algorithm. So while I was
there I plugged it in and selected it in cases where the number of
inputs is large and the selection alogorithm is likely to be slower.
What happened?
@ChrisHegarty ChrisHegarty added v8.11.0 and removed WIP labels Aug 15, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

nik9000 and others added 3 commits August 15, 2023 16:52
This adds support for null group keys in all groupings that are
backed by the PackedValuesBlockHash. This means the only groupings
that don't support null keys at the moment are long/long pairs,
and long/bytes_ref pairs. Those are coming next.
When multivalued fields are loaded from lucene they are in sorted order
but we weren't taking advantage of that fact. Now we are! It's much
faster, even for fast operations like `mv_min`

```
     (operation)  Mode  Cnt  Score   Error  Units
          mv_min  avgt    7  3.820 ± 0.070  ns/op
mv_min_ascending  avgt    7  1.979 ± 0.130  ns/op
```

We still have code to run in non-sorted mode because conversion functions
and a few other things don't load in sorted order.

I've also ported expanded the parameterized tests for the `MV_` functions
because, well, I needed to expand them at least a little to test this
change. And I just kept going and improved as many tests as I could.
@elasticsearchmachine
Copy link
Collaborator

Hi @ChrisHegarty, I've created a changelog YAML for you.

ChrisHegarty and others added 14 commits August 16, 2023 09:01
This commit adds a new TransportVersion for EsqlFeatureSetUsage.
* Sqrt function for ESQL

Introduces a unary scalar function for square root, which is a thin
wrapper over the Java.Math implementation.

* Fix area for ESQL integration changelog.

* Restore changelog.

* Restore area in changelog.
Enable unary arithmetic negations in expressions, like eval x = -y.
Support integer, long and double arguments.

Changes:
* Add mapper for Neg expressions.
* Add tests.
* Disallow negating unsigned longs during query verification.
Remove unused aggregations or eval fields that are not needed in the
 final result.
For example the query:

from employees
| stats c = count(salary), min = min(salary)
| eval x = emp_no
| keep c

is now optimized as:

from employees
| stats c = count(salary)
| keep c

since neither the min or nor the x fields are actually needed.
@ChrisHegarty
Copy link
Contributor Author

@elasticsearchmachine rerun elasticsearch-ci/bwc

@ChrisHegarty ChrisHegarty merged commit de87554 into main Aug 17, 2023
@ChrisHegarty ChrisHegarty deleted the feature/esql branch August 17, 2023 13:05
@ChrisHegarty ChrisHegarty restored the feature/esql branch August 17, 2023 13:15
@elasticsearchmachine
Copy link
Collaborator

@ChrisHegarty according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:

  • The PR is labelled release highlight but the changelog has no highlight section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >feature Team:QL (Deprecated) Meta label for query languages team test-windows Trigger CI checks on Windows v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.