Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Add pre and post filter for grouping operator #111439

Open
costin opened this issue Jul 30, 2024 · 4 comments
Open

ESQL: Add pre and post filter for grouping operator #111439

costin opened this issue Jul 30, 2024 · 4 comments
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@costin
Copy link
Member

costin commented Jul 30, 2024

Description

Grouping (STATS) command can be quite expensive, whether for processing data coming in (creating groups) or out (number of buckets), etc...
This problem can be alleviated by allowing pre and post filters, both for individual aggs and grouping keys on the grouping command to drop the data as soon being read or is being produced.
Example of pre-filter (see #110821):

FROM index | STATS a_avg = AVG(a) WHERE a > 10, avg = AVG(a)  BY g

Example of post-filter:

FROM index | STATS c = count(*) by g | WHERE c > 10
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 30, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented Jul 30, 2024

From a compute engine standpoint it feels like we could do:

-void addRawInput(Page page);
+void addRawInput(Page page, BooleanVector);

That should be easy to make the code generation stuff build. That's be super easy to specialize into constantAll, constantNone, and variable. We can make those BooleanVectors out of the expression trees easy enough.

@costin
Copy link
Member Author

costin commented Aug 9, 2024

Since the bool is used only for filtering, there's no need for MV or null handling - how about using a simple bitset instead ?

@nik9000
Copy link
Member

nik9000 commented Sep 3, 2024

For pre-filtering:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

3 participants