Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Extend STATS command to support aggregate expressions #104958

Merged
merged 8 commits into from
Feb 6, 2024

Conversation

costin
Copy link
Member

@costin costin commented Jan 31, 2024

Previously only aggregate functions (max/sum/etc..) were allowed inside
the stats command. This PR allows expressions involving one or multiple
aggregates to be used, such as:

 stats x = avg(salary % 3) + max(emp_no),
       y = min(emp_no / 3) + 10 - median(salary)
       by z = languages % 2

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@costin costin force-pushed the esql/expressions-over-aggs branch 2 times, most recently from 2d8be06 to 9beaca9 Compare January 31, 2024 00:39
Copy link
Member Author

@costin costin Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of this PR - the changes are larger because:

  1. the ReplaceDuplicateAggWithEval has been removed and incorporated into the new rule.
  2. the behavior of CombineProjections has been fixed when dealing with a Project/Aggregate, simplifying the clean-up.

Comment on lines -143 to -146
new ReplaceDuplicateAggWithEval(),
// pushing down limits again, because ReplaceDuplicateAggWithEval could create new Project nodes that can still be optimized
new PushDownAndCombineLimits(),
new ReplaceLimitAndSortAsTopN()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ReplaceDuplicateAgg and pushing down of limits again.

Comment on lines +150 to +154
// first extract nested aggs top-level - this simplifies the rest of the rules
new ReplaceStatsAggExpressionWithEval(),
// second extract nested aggs inside of them
new ReplaceStatsNestedExpressionWithEval(),
// lastly replace surrogate functions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rule breaks down expressions over aggs into eval so the underlying stats only works on top level aggregations.
While at it, it handles also duplicates to avoid repetitive computation.
This keeps the following rule simple since it's guarantees that an Aggregate will only contain AggregateFunctions not expressions over them.

@@ -239,15 +239,18 @@ protected LogicalPlan rule(Aggregate aggregate) {
// project away transient fields and re-enforce the original order using references (not copies) to the original aggs
// this works since the replaced aliases have their nameId copied to avoid having to update all references (which has
// a cascading effect)
plan = new EsqlProject(source, plan, Expressions.asAttributes(aggs));
plan = new Project(source, plan, Expressions.asAttributes(aggs));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small tweak - no need to return EsqlProject instead Project since the tree is already resolved.

}
}

return plan;
}

static String temporaryName(NamedExpression agg, AggregateFunction af) {
return "__" + agg.name() + "_" + af.functionName() + "@" + Integer.toHexString(af.hashCode());
static String temporaryName(Expression expression, AggregateFunction af) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the temporary name to make it a bit less confusing/repetitive.

Comment on lines -262 to 284
if (lit.value() == null) {
Object value = lit.value();

if (value == null) {
return lit;
}
if (lit.value() instanceof String s) {
if (value instanceof String s) {
return Literal.of(lit, new BytesRef(s));
}
if (lit.value() instanceof List<?> l) {
if (value instanceof List<?> l) {
if (l.isEmpty() || false == l.get(0) instanceof String) {
return lit;
}
return Literal.of(lit, l.stream().map(v -> new BytesRef((String) v)).toList());
List<BytesRef> byteRefs = new ArrayList<>(l.size());
for (Object v : l) {
byteRefs.add(new BytesRef(v.toString()));
}
return Literal.of(lit, byteRefs);
}
return lit;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change - couldn't help not correct it to save the method invocation and replace the noise map with a good-ol' reliable iteration.

Comment on lines 291 to 317
return p.withProjections(combineProjections(project.projections(), p.projections()));
} else if (child instanceof Aggregate a) {
project = p.withProjections(combineProjections(project.projections(), p.projections()));
child = project.child();
plan = project;
// don't return the plan since the grandchild (now child) might be an aggregate that could not be folded on the way up
// e.g. stats c = count(x) | project c, c as x | project x
// try to apply the rule again opportunistically as another node might be pushed in (a limit might be pushed in)
}
// check if the projection eliminates certain aggregates
// but be mindful of aliases to existing aggregates that we don't want to duplicate to avoid redundant work
if (child instanceof Aggregate a) {
var aggs = a.aggregates();
var newAggs = combineProjections(project.projections(), aggs);
var newGroups = replacePrunedAliasesUsedInGroupBy(a.groupings(), aggs, newAggs);
return new Aggregate(a.source(), a.child(), newGroups, newAggs);
var tuple = projectAggregations(project.projections(), aggs);
// project can be fully removed
if (tuple.v1().isEmpty()) {
var newAggs = tuple.v2();
var newGroups = replacePrunedAliasesUsedInGroupBy(a.groupings(), aggs, newAggs);
plan = new Aggregate(a.source(), a.child(), newGroups, newAggs);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gist of this change takes care of the scenario creates by ReplaceStatsAggExpressionWithEval:

stats x = sum(), y = count()
project x, x as a, y

The combine rule previously would combine project into stats which would duplicate the count:

stats x = sum(), a = sum(), y = count()

It removes the project (which is cheap) but duplicates the sum (which is expensive and the reason we didn't want to duplicate it in the first place).

The rule thus tracks is there's any new alias - however it keeps on removing unused aggregations and in case of basic aliasing project, removes the project.
So the following

stats x = sum(), y = count()
project x as a

becomes

stats a = sum()

Comment on lines -369 to +425
if (e instanceof Alias a) {
return new Alias(a.source(), a.name(), a.qualifier(), trimAliases(a.child()), a.id());
}
return trimAliases(e);
return e instanceof Alias a ? a.replaceChild(trimAliases(a.child())) : trimAliases(e);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ternary operator ❤️

Comment on lines 1138 to 1186
Alias newAlias = new Alias(k.source(), temporaryName(agg, af), null, k, null, true);
Alias newAlias = new Alias(k.source(), temporaryName(k, af), null, k, null, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the generator name strategy to be a bit more meaningful.

* becomes
* stats a = min(x), c = count(*) by g | eval b = a, d = c | keep a, b, c, d, g
*/
static class ReplaceStatsAggExpressionWithEval extends OptimizerRules.OptimizerRule<Aggregate> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of this PR - breaks down the expression over aggs and adds an eval lazily only for the fields that have an expression over aggregate functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve this a bit further for a referencing case: just like we now support | eval x = field, y = x + 1, we could now (but don't yet) support something like: | stats x = max(field), y = x + min(field) -- this now fails because column x isn't known.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a nice little feature - raised #105102 as a follow-up.

* eval b = a, d = c
* keep a, b, c, d, g
*/
static class ReplaceDuplicateAggWithEval extends OptimizerRules.OptimizerRule<Aggregate> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled by ReplaceStatsAggExpressionWithEval

Previously only aggregate functions (max/sum/etc..) were allowed inside
 the stats command. This PR allows expressions involving one or multiple
 aggregates to be used, such as:
 stats x = avg(salary % 3) + max(emp_no),
       y = min(emp_no / 3) + 10 - median(salary)
       by z = languages % 2
@costin costin force-pushed the esql/expressions-over-aggs branch from 9beaca9 to 47389a0 Compare January 31, 2024 00:58

nestedAggsNoGrouping
FROM employees
| STATS x = AVG(salary) /2 + MAX(salary), a = AVG(salary), m = MAX(salary)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@leemthompo leemthompo Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin I understand that we tag some of these tests to be included as examples in the docs. Just wondering what the workflow was with Abdon to add these tags? Was it just simply a ping to alert the writer that we want this example in the docs? :)

(Might need to be slightly more explicit about the workflow in general just because I haven't that rhythm yet)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests above are meant for internal consumption hence the ping - better to create other tests, that fix the general dataset and the rest of examples in the docs and add them in, as a separate PR after this one gets merged.
See the previous PRs authored by Abdon.

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Left only minor comments. Maybe the one related to an additional test to be of more importance.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this a first round, focusing on tests this time. I'll give this another go tomorrow.

Two observations:

  • I think there's bugs for the following cases: stats max(l) by l=languages (verification exception) and, more severely, stats max(languages) + languages by l = languages (NPE, Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.ql.expression.NameId)\" is null"); the latter works fine without the alias.
  • This allows shenanigans like stats max(languages) + languages by languages (using a grouping in the expression), but not stats languages + 1 by languages; although that may be something for a follow-up, if we want to allow this.

Other than that I have mostly minor remarks; the tests do not fully assert the (complex) expressions that are being constructed, maybe we should be stricter there.

|stats x by 1
"""));

assertThat(e.getMessage(), containsString("aggregate function"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suuuuper nit:
Shouldn't the error message be expected an aggregate function or group here as well? [x] is not an aggregate function technically implies that this should be replaced by an agg function.

* \_Eval[[____x_AVG@9efc3cf3_SUM@daf9f221{r}#18 / ____x_AVG@9efc3cf3_COUNT@53cd08ed{r}#19 AS __x_AVG@9efc3cf3, __x_AVG@
* 9efc3cf3{r}#16 / 2[INTEGER] + __x_MAX@475d0e4d{r}#17 AS x]]
* \_Limit[500[INTEGER]]
* \_Aggregate[[],[SUM(salary{f}#11) AS ____x_AVG@9efc3cf3_SUM@daf9f221, COUNT(salary{f}#11) AS ____x_AVG@9efc3cf3_COUNT@53cd0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, low prio: The generated attribute names make optimized plans pretty hard to read/grasp, due to the leading underscores and @9efc3cf3 thingies. Maybe we could streamline the names?

E.g.

 \_Aggregate[[],[SUM(salary{f}#11) AS x_AVG_SUM, COUNT(salary{f}#11) AS x_AVG_COUNT, MAX(salary{f}#11) AS x_MAX

We could still append a number in case of conflict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've revisited the naming strategy to rely on a counter across the entire rule - hopefully this avoids clashes and improves readability.

Comment on lines +2945 to +2948
// sum/count to compute avg
var div = as(fields.get(0).child(), Div.class);
// avg + max
var add = as(fields.get(1).child(), Add.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're technically not asserting the whole expression here - maybe better to just assert the expression string?

Applies to all added tests in this file.

* PERCENTILE(salary{f}#1928,50[INTEGER]) AS __y_MEDIAN@705fccec]]
* \_Eval[[languages{f}#1926 % 2[INTEGER] AS z,
* salary{f}#1928 % 3[INTEGER] AS ____x_AVG@e03a7a5c_AVG@e03a7a5c,
* emp_no{f}#1923 / 3[INTEGER] AS ____y_MIN@80cee21c_MIN@80cee21c]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the attribute names are constructed from the aggs where they will be used, but not from the expression where they will be used; this makes it a bit hard to read this bottom-up (and top-down is also hard). I think it will also lead to tricky attribute names in cases like y = min(emp_no /3) - min(emp_no + 2), as both attributes will be called ____y_MIN@...._MIN@.....
Maybe this would be more consistent?

Suggested change
* emp_no{f}#1923 / 3[INTEGER] AS ____y_MIN@80cee21c_MIN@80cee21c]]
* emp_no{f}#1923 / 3[INTEGER] AS y_SUB1_MIN]]

(resp. ...SUB0... for the left hand side)

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, this is more like educational purposes for me. I left two minor suggestions on the test cases, and a question on the CombineProjections rule.

|stats 1
"""));

assertThat(e.getMessage(), containsString("expected an aggregate function or group"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts with the same exception messages can be factored out.

e = min(salary),
f = max(salary),
g = max(salary)
by w = languages % 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps one of these test cases can be modified to use different groupings, for example, by auto_bucket(emp_no, 10, 1, 10000), to increase the coverage a bit more.

* \_Eval[[languages{f}#37 % 2[INTEGER] AS w]]
* \_EsRelation[test][_meta_field{f}#40, emp_no{f}#34, first_name{f}#35, ..]
*/
public void testStatsExpOverAggsWithScalarAndDuplicateAggs() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support removing duplicated expressions over aggregations? Like below:

| stats x = avg(salary) /2 + max(salary) , y = avg(salary) /2 + max(salary)

It seems like we recalculate expression part for x and y, the duplicated aggregations - avg and max are not recalculated. Detecting equivalent expressions over aggregations could be more complicated than detecting equivalent aggregations. Perhaps it is because CombineProjections does not check the pattern created by ReplaceStatsAggExpressionWithEval, CombineProjections checks project over project, this case has the pattern of project over eval over project over eval. Just to take a note here, not sure if it worth supporting it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be covered as part of our plan to eliminate common (sub-) expressions: #103301

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another round, focusing on the optimizer code this time. I didn't find anything not already covered by others' remarks. Looks good, very neat feature!

Comment on lines +1222 to +1230
* Replace nested expressions over aggregates with synthetic eval post the aggregation
* stats a = sum(a) + min(b) by x
* becomes
* stats a1 = sum(a), a2 = min(b) by x | eval a = a1 + a2 | keep a, x
*
* Since the logic is very similar, this rule also handles duplicate aggregate functions to avoid duplicate compute
* stats a = min(x), b = min(x), c = count(*), d = count() by g
* becomes
* stats a = min(x), c = count(*) by g | eval b = a, d = c | keep a, b, c, d, g
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to the examples in the javadoc, very useful.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a more relevant note in a comment and some smaller nits.
Otherwise, while not fundamental, this PR together with #104387 make ESQL writing really "liberating" -- very nice.

Comment on lines 250 to 253
String name = expression instanceof NamedExpression ne
? ne.name()
: expression.nodeName() + "@" + Integer.toHexString(expression.hashCode());
return "__" + name + "_" + af.functionName() + "@" + Integer.toHexString(af.hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure if this makes it easier to read, but an alternative to unify the way the nodes are named:

Suggested change
String name = expression instanceof NamedExpression ne
? ne.name()
: expression.nodeName() + "@" + Integer.toHexString(expression.hashCode());
return "__" + name + "_" + af.functionName() + "@" + Integer.toHexString(af.hashCode());
Function<Expression, String> nf = e -> (e == af ? af.functionName() : e.nodeName()) + "@" + Integer.toHexString(e.hashCode());
String name = expression instanceof NamedExpression ne ? ne.name() : nf.apply(expression);
return "__" + name + "_" + nf.apply(af);

@costin
Copy link
Member Author

costin commented Feb 6, 2024

  • I think there's bugs for the following cases: stats max(l) by l=languages (verification exception) and, more severely, stats max(languages) + languages by l = languages (NPE, Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.ql.expression.NameId)\" is null"); the latter works fine without the alias.
  • This allows shenanigans like stats max(languages) + languages by languages (using a grouping in the expression), but not stats languages + 1 by languages; although that may be something for a follow-up, if we want to allow this.

There's a subtle bug when dealing with an aliased grouping which sometimes popped up because the validation allowed the grouping column in some queries.
I've raised #105172 as a follow-up and in the meantime improved the validator to not allow this scenario.

@@ -205,7 +204,7 @@ protected LogicalPlan rule(Aggregate aggregate) {
var attr = aggFuncToAttr.get(af);
// the agg doesn't exist in the Aggregate, create an alias for it and save its attribute
if (attr == null) {
var temporaryName = temporaryName(agg, af);
var temporaryName = temporaryName(af, agg, counter[0]++);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the strategy to indicate the inner and outer expression as that can differ across rules - in some an aggregation is an inner expressions while in others is the outer one.

Comment on lines +248 to +252
static String temporaryName(Expression inner, Expression outer, int suffix) {
String in = toString(inner);
String out = toString(outer);
return "$$" + in + "$" + out + "$" + suffix;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opted for $$a$ instead of _ as that was used to replace spaces.

Comment on lines +3081 to +3091
* Project[[a{r}#5, b{r}#9, $$max(salary)_+_3>$COUNT$2{r}#46 AS d, $$count(salary)_->$MIN$3{r}#47 AS e, $$avg(salary)_+_m
* >$MAX$1{r}#45 AS g]]
* \_Eval[[$$$$avg(salary)_+_m>$AVG$0$SUM$0{r}#48 / $$max(salary)_+_3>$COUNT$2{r}#46 AS $$avg(salary)_+_m>$AVG$0, $$avg(
* salary)_+_m>$AVG$0{r}#44 + $$avg(salary)_+_m>$MAX$1{r}#45 AS a, $$avg(salary)_+_m>$MAX$1{r}#45 + 3[INTEGER] +
* 3.141592653589793[DOUBLE] + $$max(salary)_+_3>$COUNT$2{r}#46 AS b]]
* \_Limit[500[INTEGER]]
* \_Aggregate[[w{r}#28],[SUM(salary{f}#39) AS $$$$avg(salary)_+_m>$AVG$0$SUM$0, MAX(salary{f}#39) AS $$avg(salary)_+_m>$MAX$1
* , COUNT(salary{f}#39) AS $$max(salary)_+_3>$COUNT$2, MIN(salary{f}#39) AS $$count(salary)_->$MIN$3]]
* \_Eval[[languages{f}#37 % 2[INTEGER] AS w]]
* \_EsRelation[test][_meta_field{f}#40, emp_no{f}#34, first_name{f}#35, ..]
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the new naming strategy.

@costin costin merged commit ac09d75 into elastic:main Feb 6, 2024
15 checks passed
@costin costin deleted the esql/expressions-over-aggs branch February 6, 2024 03:08
dej611 added a commit to elastic/kibana that referenced this pull request Feb 9, 2024
## Summary

Sync with elastic/elasticsearch#104958 for
support of builtin fn in STATS
  * validation ✅ 
  * autocomplete ✅ 
  * also fixed `STATS BY <field>` syntax


![new_stats](https://github.com/elastic/kibana/assets/924948/735f9842-b1d3-4aa0-9d51-4b2f9b136ed3)


Sync with elastic/elasticsearch#104913 for new
`log` function
  * validation ✅  - also warning for negative values
  * autocomplete ✅ 

![add_log](https://github.com/elastic/kibana/assets/924948/146b945d-a23b-45ec-9df2-2d2b291e883b)

Sync with elastic/elasticsearch#105064 for
removal of `PROJECT` command
  * validation ✅  (both new and legacy syntax supported)
  * autocomplete ✅  (will only suggest new syntax)


![remove_project](https://github.com/elastic/kibana/assets/924948/b6f40afe-a26d-4917-b7a1-d8ae97c5368b)

Sync with elastic/elasticsearch#105221 for
removal of mandatory brackets for `METADATA` command option
* validation ✅ (added warning deprecation message when using brackets)
  * autocomplete ✅ 

![fix_metadata](https://github.com/elastic/kibana/assets/924948/c65db176-dd94-45f3-9524-45453e62f51a)


Sync with elastic/elasticsearch#105224 for
change of syntax for ENRICH ccq mode
  * validation ✅ 
* autocomplete ✅ (not directly promoted, the user has to type `_` to
trigger it)
  * hover ✅ 
  * code actions ✅ 

![fix_ccq_enrich](https://github.com/elastic/kibana/assets/924948/0900edd9-a0a7-4ac8-bc12-e39a72359984)

![fix_ccq_enrich_2](https://github.com/elastic/kibana/assets/924948/74b0908f-d385-4723-b3d4-c09108f47a73)


Do not merge until those 5 get merged.

Additional things in this PR:
* Added more tests for `callbacks` not passed scenario
  * covered more cases like those with `dissect`
* Added more tests for signature params number (calling a function with
an extra arg should return an error)
* Cleaned up some more unused code
* Improved messages on too many arguments for functions

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
## Summary

Sync with elastic/elasticsearch#104958 for
support of builtin fn in STATS
  * validation ✅ 
  * autocomplete ✅ 
  * also fixed `STATS BY <field>` syntax


![new_stats](https://github.com/elastic/kibana/assets/924948/735f9842-b1d3-4aa0-9d51-4b2f9b136ed3)


Sync with elastic/elasticsearch#104913 for new
`log` function
  * validation ✅  - also warning for negative values
  * autocomplete ✅ 

![add_log](https://github.com/elastic/kibana/assets/924948/146b945d-a23b-45ec-9df2-2d2b291e883b)

Sync with elastic/elasticsearch#105064 for
removal of `PROJECT` command
  * validation ✅  (both new and legacy syntax supported)
  * autocomplete ✅  (will only suggest new syntax)


![remove_project](https://github.com/elastic/kibana/assets/924948/b6f40afe-a26d-4917-b7a1-d8ae97c5368b)

Sync with elastic/elasticsearch#105221 for
removal of mandatory brackets for `METADATA` command option
* validation ✅ (added warning deprecation message when using brackets)
  * autocomplete ✅ 

![fix_metadata](https://github.com/elastic/kibana/assets/924948/c65db176-dd94-45f3-9524-45453e62f51a)


Sync with elastic/elasticsearch#105224 for
change of syntax for ENRICH ccq mode
  * validation ✅ 
* autocomplete ✅ (not directly promoted, the user has to type `_` to
trigger it)
  * hover ✅ 
  * code actions ✅ 

![fix_ccq_enrich](https://github.com/elastic/kibana/assets/924948/0900edd9-a0a7-4ac8-bc12-e39a72359984)

![fix_ccq_enrich_2](https://github.com/elastic/kibana/assets/924948/74b0908f-d385-4723-b3d4-c09108f47a73)


Do not merge until those 5 get merged.

Additional things in this PR:
* Added more tests for `callbacks` not passed scenario
  * covered more cases like those with `dissect`
* Added more tests for signature params number (calling a function with
an extra arg should return an error)
* Cleaned up some more unused code
* Improved messages on too many arguments for functions

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary

Sync with elastic/elasticsearch#104958 for
support of builtin fn in STATS
  * validation ✅ 
  * autocomplete ✅ 
  * also fixed `STATS BY <field>` syntax


![new_stats](https://github.com/elastic/kibana/assets/924948/735f9842-b1d3-4aa0-9d51-4b2f9b136ed3)


Sync with elastic/elasticsearch#104913 for new
`log` function
  * validation ✅  - also warning for negative values
  * autocomplete ✅ 

![add_log](https://github.com/elastic/kibana/assets/924948/146b945d-a23b-45ec-9df2-2d2b291e883b)

Sync with elastic/elasticsearch#105064 for
removal of `PROJECT` command
  * validation ✅  (both new and legacy syntax supported)
  * autocomplete ✅  (will only suggest new syntax)


![remove_project](https://github.com/elastic/kibana/assets/924948/b6f40afe-a26d-4917-b7a1-d8ae97c5368b)

Sync with elastic/elasticsearch#105221 for
removal of mandatory brackets for `METADATA` command option
* validation ✅ (added warning deprecation message when using brackets)
  * autocomplete ✅ 

![fix_metadata](https://github.com/elastic/kibana/assets/924948/c65db176-dd94-45f3-9524-45453e62f51a)


Sync with elastic/elasticsearch#105224 for
change of syntax for ENRICH ccq mode
  * validation ✅ 
* autocomplete ✅ (not directly promoted, the user has to type `_` to
trigger it)
  * hover ✅ 
  * code actions ✅ 

![fix_ccq_enrich](https://github.com/elastic/kibana/assets/924948/0900edd9-a0a7-4ac8-bc12-e39a72359984)

![fix_ccq_enrich_2](https://github.com/elastic/kibana/assets/924948/74b0908f-d385-4723-b3d4-c09108f47a73)


Do not merge until those 5 get merged.

Additional things in this PR:
* Added more tests for `callbacks` not passed scenario
  * covered more cases like those with `dissect`
* Added more tests for signature params number (calling a function with
an extra arg should return an error)
* Cleaned up some more unused code
* Improved messages on too many arguments for functions

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
elasticsearchmachine pushed a commit that referenced this pull request Dec 3, 2024
Fix #117770 Fix
#117784

#117699 made changes to how
we plan aggregations which were supposed to only trigger when a query
contained a `CATEGORIZE`, but accidentally changed a code path that
seems to only be required for interoperability with pre-8.13 nodes.
Because of this, we didn't notice failing tests until the periodic bwc
tests ran.

The code this PR fixes addresses situations where `Aggregate` plan nodes
contained _aliases inside the aggregates_. On `main` and `8.x`, this is
effectively an illegal state: since
#104958, aliases in the
aggregates become `Eval` nodes before and after the `Aggregate` node.

But here, on 8.x, we'll just fix this code path so that it behaves
exactly as before #117699.

If this passes the full-bwc test, I plan to forward-port this by
removing the obsolete code path on `main`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants