-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Syntax support and operator for count all #99602
Conversation
578701b
to
959254d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @costin. I left some comments for the count operator. I can push these changes if you prefer.
// check to not go over limit | ||
var count = Math.min(leafCount, remainingDocs); | ||
totalHits += count; | ||
remainingDocs -= count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to advance the position of the scorer after we use the shortcut.
Page page = null; | ||
if (remainingDocs <= 0 || scorer.isDone()) { | ||
pagesEmitted++; | ||
page = new Page(1, IntBlock.newConstantBlockWith(totalHits, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator should emit only one page.
.../plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneCountOperator.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
.../plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneCountOperator.java
Show resolved
Hide resolved
9467a49
to
76e56ee
Compare
Use internal aggs when pushing down count
a4f4f86
to
b8b429f
Compare
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
b8b429f
to
2952bf9
Compare
Relates #99459 |
2952bf9
to
f95e368
Compare
* 1. the count as a long (0 if no doc is seen) | ||
* 2. a bool flag (seen) that's always true meaning that the group (all items) always exists | ||
*/ | ||
public class LuceneCountOperator extends LuceneOperator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is very similar to LuceneSourceOperator - I tried combining the two but there's not much code reuse and the inner state and semantics are fairly different.
Weight weight() { | ||
return weight; | ||
} | ||
|
||
int position() { | ||
return position; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made these accessible for the count source.
private PhysicalOperation planEsQueryNode(EsQueryExec esQuery, LocalExecutionPlannerContext context) { | ||
if (esQuery.query() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query check and initialization happens inside physicalOperatorProviders method.
EsPhysicalOperationProviders esProvider = (EsPhysicalOperationProviders) physicalOperationProviders; | ||
|
||
Function<SearchContext, Query> querySupplier = EsPhysicalOperationProviders.querySupplier(statsQuery.query()); | ||
|
||
Expression limitExp = statsQuery.limit(); | ||
int limit = limitExp != null ? (Integer) limitExp.fold() : NO_LIMIT; | ||
final LuceneOperator.Factory luceneFactory = new LuceneCountOperator.Factory( | ||
esProvider.searchContexts(), | ||
querySupplier, | ||
context.dataPartitioning(), | ||
context.taskConcurrency(), | ||
limit | ||
); | ||
|
||
Layout.Builder layout = new Layout.Builder(); | ||
layout.append(statsQuery.outputSet()); | ||
int instanceCount = Math.max(1, luceneFactory.taskConcurrency()); | ||
context.driverParallelism(new DriverParallelism(DriverParallelism.Type.DATA_PARALLELISM, instanceCount)); | ||
return PhysicalOperation.fromSource(luceneFactory, layout.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've copied the code from esSource for now - this is candidate for future factoring.
This issue is more visible with
both fail with that NPE with the difference that these types of queries are more likely than |
|
However from your comments I agree that this is confusing - I've pushed a PR that adds that in along with some (your) tests. |
} | ||
|
||
@Override | ||
public int intermediateBlockCount() { | ||
return intermediateStateDesc().size(); | ||
} | ||
|
||
private int blockIndex() { | ||
return countAll ? 0 : channels.get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 not sure if this is the best way to count things in a block but it seems to be working.
@@ -51,33 +57,35 @@ public int intermediateBlockCount() { | |||
|
|||
@Override | |||
public AddInput prepareProcessPage(SeenGroupIds seenGroupIds, Page page) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 likewise here
@@ -508,7 +508,7 @@ from employees | stats c = count(*) by gender | sort gender; | |||
c:l | gender:s | |||
33 | F | |||
57 | M | |||
10 | null | |||
#10 | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 There's a discrepancy between EsqlActionIT (running in Lucene) and CsvTests - the former don't have a null group for gender while the latter do hence why the same csv-spec fails for one but passes for the other.
I've commented this line to make EsqlActionIT pass but it will make CsvTests fail.
Since the count works inside the group, I wonder if null group is either discarded when querying against Lucene or if the underlying filter doesn't work properly (ignores missing values somehow).
Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, this was caused by #100109
@@ -517,7 +517,7 @@ from employees | stats c = count(*), min = min(emp_no) by gender | sort gender; | |||
c:l | min:i | gender:s | |||
33 | 10002 | F | |||
57 | 10001 | M | |||
10 | 10010 | null | |||
#10 | 10010 | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@@ -526,5 +526,5 @@ from employees | stats min = min(salary) by gender | eval x = min + 1 | stats c | |||
c:l | gender:s | |||
1 | F | |||
1 | M | |||
1 | null | |||
#1 | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...sql/compute/src/main/java/org/elasticsearch/compute/aggregation/CountAggregatorFunction.java
Show resolved
Hide resolved
...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/ExpressionBuilder.java
Outdated
Show resolved
Hide resolved
Introduce physical plan for representing query stats Use internal aggs when pushing down count Add support for count all outside Lucene
Introduce physical plan for representing query stats Use internal aggs when pushing down count Add support for count all outside Lucene
Introduce support for COUNT(*) and along with a dedicated Lucene source for pushing it down instead of reading all the items one by one.
This PR doesn't handle COUNT( * ) outside Lucene (on blocks) - will take care of that (if needed) in a separate PR.
This PR takes advantage of certain stats that can be implemented at Lucene level through a query even when a filter exists (whether it is specified in the query or outside) - in this PR, count().
When just a count() aggregation is encountered (regardless of whether a filter exists or not), the Physical optimizer will convert it into a dedicated source (LuceneCountSource) which will return the intermediate aggregation state (the count and seen bool).
There are several improvements to be done in the future: