-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESQL] Adding a Lucene min/max operator #113785
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those following along at home, this is going to be very useful when there isn't a filter but there are deleted documents in some segments - just not all segments. In that case we can't find the min or max at plan time. This can mostly use the fast path then.
It's also fairly useful when we have a filter. Instead of loading blocks of documents and passing those documents to field loading and then passing those fields to MAX we can short cut that into a fairly tight loop.
...n/esql/compute/src/test/java/org/elasticsearch/compute/lucene/LuceneMaxLonOperatorTests.java
Outdated
Show resolved
Hide resolved
...n/esql/compute/src/test/java/org/elasticsearch/compute/lucene/LuceneMinLonOperatorTests.java
Outdated
Show resolved
Hide resolved
...n/esql/compute/src/test/java/org/elasticsearch/compute/lucene/LuceneMinOperatorTestCase.java
Outdated
Show resolved
Hide resolved
|
||
Page page = null; | ||
// emit only one page | ||
if (remainingDocs <= 0 && pagesEmitted == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to confirm - this exists so that we can be called over and over and over again and we will transfer control back to the main loop after each slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must say I just follow the implementation in LuceneCountOperator:
Line 140 in 21ccde5
if (remainingDocs <= 0 && pagesEmitted == 0) { |
I don't really follow all this protection, we have doneCollecting
, remainingDocs
and pagesEmitted
which sort of signal if we are done or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's about yielding after performing "some work" and letting the driver have another iteration so it can update status and check cancellation and things. I don't know precisely where it came from, but that's my guess of it.
...plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneMinMaxOperator.java
Outdated
Show resolved
Hide resolved
...plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneMinMaxOperator.java
Show resolved
Hide resolved
...n/esql/compute/src/test/java/org/elasticsearch/compute/lucene/LuceneMaxOperatorTestCase.java
Outdated
Show resolved
Hide resolved
...n/esql/compute/src/test/java/org/elasticsearch/compute/lucene/LuceneMaxOperatorTestCase.java
Show resolved
Hide resolved
...plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneMinMaxOperator.java
Show resolved
Hide resolved
Hey @nik9000! what are we missing here to move it forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! It's been a long time since I looked at this. Sorry. I had dropped your review request through the cracks.
I think we should get this in, maybe with a slight testing change, and then see how we can build on it.
💚 Backport successful
|
This operator only optimises the computation of the min/max value if the field contains a BKD tree, no deletes and we are visiting all documents for the segment. Otherwise it computes the value iterating on a tight loop.
This operator only optimises the computation of the min/max value if the field contains a BKD tree, no deletes and we are visiting all documents for the segment. Otherwise it computes the value iterating on a tight loop.
This operator only optimises the computation of the min/max value if the field contains a BKD tree, no deletes and we are visiting all documents for the segment. Otherwise it computes the value iterating on a tight loop.
This commit introduces an ESQL lucene min/max operator. It is still not hook to the language but ready for doing so.
The operator only optimise the computation of the min/max value if the field contains a BKD tree, no deletes and we are visiting all documents for the segment. In that case it can take the value for the tree metadata, otherwise it uses the doc values to find the value.
I have label the PR as a non issue because it is not used by the language yet.
relates #99838