Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Add boolean support to Max and Min aggs #110527

Merged
merged 18 commits into from
Jul 10, 2024

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented Jul 5, 2024

  • Added support for Booleans on Max and Min
  • Added some helper methods to BitArray (set(index, value) and fill(from, to, value)). This way, the container is more similar to other BigArrays, and it's easier to work with

Part of #110346, as Max and Min are dependencies of Top.

@ivancea ivancea added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Jul 5, 2024
@ivancea ivancea requested a review from nik9000 July 5, 2024 14:44
@ivancea ivancea requested a review from a team as a code owner July 5, 2024 14:44
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are too simple and basic.
Please, add more csv tests that use more complex queries, to also cover complexities associated with plan optimizations. One example:
from employees | eval x = salary is not null | where emp_no > 10050 | stats min(salary is not null), max(x), count(*) by gender

@@ -31,6 +31,44 @@ MIN(languages):integer
// end::min-result[]
;

maxOfBoolean
required_capability: agg_max_min_boolean_support
from employees | stats s = max(still_hired);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and from employees | stats s = min(still_hired) can be a single one, no need to run two queries that do max and min separately.

The same goes for the next pair of queries.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are better, but it can be even better.
When I suggested one query, I was hoping you can come up with more queries in the same idea.

For example: from employees | eval x = salary is not null | where emp_no > 10050 | stats min_salary = min(salary is not null), max_salary = max(x), max_case = max(case(languages > 2, true, false)) by languages which uses the case function.

In employees.csv we have a multi-value boolean field - is_rehired - you can use that and combine it with mv_* functions and feed the result back to min and max (max(mv_min(is_rehired))) with variations.

ivancea added 5 commits July 9, 2024 11:35
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Max.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Min.java
@ivancea
Copy link
Contributor Author

ivancea commented Jul 9, 2024

@astefan Changes made, and merged with main for new tests:

  1. In another PR, aggregation tests were merged for Max and Min. So I included there the boolean cases. It should currently cover an interesting set of cases, and potentially, it would cover much more (grouping for example isn't ready yet, but should be soon).
  2. Added some extra CSV cases you commented in the same query.
    I understand that the initial test I did was clearly not enough. But I would stop here for this. Specially with the new non-CSV tests. We could be thinking in new cases for hours otherwise.
    If we want to test a known bunch of cases for functions using the full planner like we do here, I would consider either adding them to the "abstract function test cases", or adding another suite somewhere. Maybe even a list of "you should test this" for CSV function tests in docs we can refer to?
  3. About mixing min and max tests, I mixed each function cases in one, instead of mixing the functions. I did this because existing cases do it this way, and to keep different function tests separated. I could mix all "MaxOf" into a single one as I did in Top, but I would avoid doing it unless required (?)

@astefan
Copy link
Contributor

astefan commented Jul 9, 2024

Specially with the new non-CSV tests.

I regard the tests in AbstractFunctionTestCase or AbstractAggregationTestCase and those in csv-spec as completely different and complementary. The former tests the basic functionality of a function, the latter tests the same but in more complex ways that highlight the analyzer, the optimizer, the local optimizer plus nested calls between AggregatorFunctionSuppliers and ExpressionEvaluator.

I understand that the initial test I did was clearly not enough.
...
We could be thinking in new cases for hours otherwise

When adding support to functions (any type - aggregation, scalar, predicates) there should be a human reasonable amount of time spent coming up with some (not all) more complex queries that highlight the new functionality in the form of csv-spec tests. We do have a good coverage and you could take inspiration from other queries we came up with over time.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ivancea ivancea added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 9, 2024
@ivancea
Copy link
Contributor Author

ivancea commented Jul 9, 2024

@elasticmachine update branch

@ivancea
Copy link
Contributor Author

ivancea commented Jul 10, 2024

@elasticmachine update branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) ES|QL-ui Impacts ES|QL UI >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-update-serverless v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants