-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Elasticsearch Query Language, ES|QL #98309
Conversation
In SQL `AVG(foo)` is `null` if there are no value for `foo`. Same for `MIN(foo)` and `MAX(foo)`. In fact, the only functions that don't return `null` on empty inputs seem to be `COUNT` and `COUNT(DISTINCT`. This flips our non-grouping aggs to have the same behavior because it's both more expected and fits better with other things we're building. This *is* different from Elasticsearch's aggs. But it's different in a good way. It also lines up more closely with the way that our grouping aggs work. This also revives the broken `AggregatorBenchmark` so that I could get performance figures for this change. And it's within the margin of error: ``` (blockType) (grouping) (op) Mode Cnt Before After Units vector_longs none sum avgt 7 0.440 ± 0.017 0.397 ± 0.003 ns/op half_null_longs none sum avgt 7 5.785 ± 0.022 5.861 ± 0.134 ns/op ``` I expected a small slowdown on the `half_null_longs` line and see it, but is within the margin of error. Either way, that's not the line that's nearly as optimized. We'll loop back around to it eventually. Closes ESQL-1297
🤖 ESQL: Merge upstream
🤖 ESQL: Merge upstream
This rigs the exceptions caught by `warnExceptions` in `Evaluator` to emit warnings through Elasticsearch's warnings system, the same way that the conversion functions work. Closes ESQL-1211
Convert Avg into a SurrogateExpression and introduce dedicated rule for handling surrogate AggregateFunction Remove Avg implementation Use sum instead of avg in some planning test Add dataType case for Div operator Relates ESQL-747
Euler's number.
This enables the `min` and `max` aggs on `date` fields without enabling any of the other numeric aggregates on `date`s. It also adds a fairly paranoid test that `sum` is not enabled on `date`s because that doesn't make a whole lot of sense. Closes ESQL-1247
🤖 ESQL: Merge upstream
🤖 ESQL: Merge upstream
This PR adds a new method that allows appending a block to a builder at a single position. This method is required to support multi-values in the enrich lookup. Ideally, the new method should be unified with the `copyFrom` method, but I will address it as a follow-up to reduce the complexity of this PR. Closes ESQL-1280
🤖 ESQL: Merge upstream
Co-authored-by: Abdon Pijpelink <[email protected]>
🤖 ESQL: Merge upstream
🤖 ESQL: Merge upstream
🤖 ESQL: Merge upstream
🤖 ESQL: Merge upstream
This adds an explicit test that makes sure that grouping aggs (`STATS FOO(x) BY y`) *mostly* return `null` if they receive only null values. We have code for this scattered around the grouping aggs but we were only testing it *sometimes*. This tests it all the time. Also! The behavior wasn't *quite* consistent. `COUNT` and `COUNT(DISTINCT` style aggs should return `0` but they didn't all do that. And non-`COUNT` style aggs should return `null` and they all did that. Exception `SUM` on doubles.
🤖 ESQL: Merge upstream
🤖 ESQL: Merge upstream
returns current datetime
This implements the `MV_DEDUPE` function that removes duplicates from multivalues fields. It wasn't strictly in our list of things we need in the first release, but I'm grabbing this now because I realized I needed very similar infrastructure when I was trying to build grouping by multivalued fields. In fact, I realized that I could use our stringtemplate code generation to generate most of the complex parts. This generates the actual body of `MV_DEDUPE`'s implementation and the body of the `Block` accepting `BlockHash` implementations. It'll be useful in the final step for grouping by multivalued fields. I also got pretty curious about whether the `O(n^2)` or `O(n*log(n))` algorithm for deduplication is faster. I'd been assuming that for all reasonable sized inputs the `O(n^2)` bubble sort looking selection algorithm was faster. So I measured it. And it's mostly true - even for `BytesRef` if you have a dozen entries the selection algorithm is faster. Lower overhead and stuff. Anyway, to measure it I had to implement the copy-and-sort `O(n*log(n))` algorithm. So while I was there I plugged it in and selected it in cases where the number of inputs is large and the selection alogorithm is likely to be slower.
[DOCS] Clarify the field order for RENAME
🤖 ESQL: Merge upstream
What happened?
🤖 ESQL: Merge upstream
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
This adds support for null group keys in all groupings that are backed by the PackedValuesBlockHash. This means the only groupings that don't support null keys at the moment are long/long pairs, and long/bytes_ref pairs. Those are coming next.
When multivalued fields are loaded from lucene they are in sorted order but we weren't taking advantage of that fact. Now we are! It's much faster, even for fast operations like `mv_min` ``` (operation) Mode Cnt Score Error Units mv_min avgt 7 3.820 ± 0.070 ns/op mv_min_ascending avgt 7 1.979 ± 0.130 ns/op ``` We still have code to run in non-sorted mode because conversion functions and a few other things don't load in sorted order. I've also ported expanded the parameterized tests for the `MV_` functions because, well, I needed to expand them at least a little to test this change. And I just kept going and improved as many tests as I could.
Hi @ChrisHegarty, I've created a changelog YAML for you. |
This commit adds a new TransportVersion for EsqlFeatureSetUsage.
* Sqrt function for ESQL Introduces a unary scalar function for square root, which is a thin wrapper over the Java.Math implementation. * Fix area for ESQL integration changelog. * Restore changelog. * Restore area in changelog.
Enable unary arithmetic negations in expressions, like eval x = -y. Support integer, long and double arguments. Changes: * Add mapper for Neg expressions. * Add tests. * Disallow negating unsigned longs during query verification.
Remove unused aggregations or eval fields that are not needed in the final result. For example the query: from employees | stats c = count(salary), min = min(salary) | eval x = emp_no | keep c is now optimized as: from employees | stats c = count(salary) | keep c since neither the min or nor the x fields are actually needed.
@elasticsearchmachine rerun elasticsearch-ci/bwc |
@ChrisHegarty according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:
|
Integrate the Elasticsearch Query Language, ES|QL.
This Pull Request offers a consolidated view over the commits of the ES|QL feature branch, allowing to review and test them altogether, and in one place.
Upon integration into main this PR will merge the commits as-is (not squashed). This will effectively preserve the history of the ES|QL code.