Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESQL] Adding a Lucene min/max operator #113785

Merged
merged 8 commits into from
Nov 14, 2024
Merged

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Sep 30, 2024

This commit introduces an ESQL lucene min/max operator. It is still not hook to the language but ready for doing so.

The operator only optimise the computation of the min/max value if the field contains a BKD tree, no deletes and we are visiting all documents for the segment. In that case it can take the value for the tree metadata, otherwise it uses the doc values to find the value.

I have label the PR as a non issue because it is not used by the language yet.

relates #99838

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 30, 2024
@iverase
Copy link
Contributor Author

iverase commented Sep 30, 2024

@elasticmachine update branch

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those following along at home, this is going to be very useful when there isn't a filter but there are deleted documents in some segments - just not all segments. In that case we can't find the min or max at plan time. This can mostly use the fast path then.

It's also fairly useful when we have a filter. Instead of loading blocks of documents and passing those documents to field loading and then passing those fields to MAX we can short cut that into a fairly tight loop.


Page page = null;
// emit only one page
if (remainingDocs <= 0 && pagesEmitted == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to confirm - this exists so that we can be called over and over and over again and we will transfer control back to the main loop after each slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say I just follow the implementation in LuceneCountOperator:

I don't really follow all this protection, we have doneCollecting, remainingDocs and pagesEmitted which sort of signal if we are done or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's about yielding after performing "some work" and letting the driver have another iteration so it can update status and check cancellation and things. I don't know precisely where it came from, but that's my guess of it.

@nik9000 nik9000 self-requested a review October 10, 2024 19:38
@iverase
Copy link
Contributor Author

iverase commented Nov 5, 2024

Hey @nik9000! what are we missing here to move it forward?

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! It's been a long time since I looked at this. Sorry. I had dropped your review request through the cracks.

I think we should get this in, maybe with a slight testing change, and then see how we can build on it.

@iverase iverase added auto-backport Automatically create backport pull requests when merged >non-issue and removed >non-issue >enhancement labels Nov 14, 2024
@iverase iverase merged commit f400839 into elastic:main Nov 14, 2024
16 checks passed
@iverase iverase deleted the minmaxoperator branch November 14, 2024 14:31
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

iverase added a commit to iverase/elasticsearch that referenced this pull request Nov 14, 2024
This operator only optimises the computation of the min/max value if the field contains a BKD tree, 
no deletes and we are visiting all documents for the segment. Otherwise it computes the value 
iterating on a tight loop.
elasticsearchmachine pushed a commit that referenced this pull request Nov 14, 2024
This operator only optimises the computation of the min/max value if the field contains a BKD tree, 
no deletes and we are visiting all documents for the segment. Otherwise it computes the value 
iterating on a tight loop.
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Nov 18, 2024
This operator only optimises the computation of the min/max value if the field contains a BKD tree, 
no deletes and we are visiting all documents for the segment. Otherwise it computes the value 
iterating on a tight loop.
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
This operator only optimises the computation of the min/max value if the field contains a BKD tree, 
no deletes and we are visiting all documents for the segment. Otherwise it computes the value 
iterating on a tight loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Compute Engine Analytics in ES|QL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants