-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Reduce the number of Evals ReplaceMissingFieldWithNull creates #104586
Conversation
Improve ReplaceMissingFieldWithNull to create just one eval for the missing value and have the rest point to it. This reduces the amount of EvalOperators created in the pipeline. Fix elastic#104583
Hi @costin, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works and I couldn't find any queries that would break this. LGTM!
...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
Show resolved
Hide resolved
|
||
java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.resolve(r, r); | ||
|
||
// collect aliases bottom-up | ||
plan.forEachExpressionUp(Alias.class, a -> { | ||
var c = a.child(); | ||
if (c.foldable()) { | ||
collectRefs.put(a.toAttribute(), c); | ||
boolean shouldCollect = c.foldable(); | ||
// try to resolve the expression based on an existing foldables | ||
if (shouldCollect == false) { | ||
c = c.transformUp(ReferenceAttribute.class, replaceReference); | ||
shouldCollect = c.foldable(); | ||
} | ||
if (shouldCollect) { | ||
collectRefs.put(a.toAttribute(), Literal.of(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we need to make changes to PropagateEvalFoldables
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same ^^. Is it to accelerate the resolution? Wouldn't the first c.foldable()
eventually return true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention this - while running debugging I've noticed this rule is doing only one transform per run causing the optimizer to run multiple times. So I've modified it so it can do the replacement in one go instead of multiple method calls.
An example is this:
eval x = 1
eval y = x
eval z = x + 3
eval w = z + y
Previously the rule had to run 4 times, now it does it in the first run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to reproduce the behavior reported in the original issue and, given the description of the issue and the one of the ReplaceMissingFieldWithNull
rule, I failed to repro.
Debugging the code on a single debug-mode node showed that "missing" means "the field doesn't exist in the mapping" of an index, which is different than (what I originally thought) missing from the documents.
This means that the reproduction step involves at least two nodes, one index having its shards (probably only one) assigned to one node, the other index having the shard(s) on the second node. Also, one index has to have some fields that the other doesn't.
Given this, I would adjust the description of the rule to be a bit more exact and potentially add one-two comments in the code of the rule as well, to make sure there are no confusions about what "missing" means.
I would love to see an IT test with this one in action since it's changing the way the projected fields are treated (now having multiple aliases pointing to the same field = null
expression). And the whole point of this rule is to help expedite queries that can end up not actually touching the indices. This means a query like from test* | project field3
with field3 "missing" from one of the indices would not even touch those shards on potentially several nodes and just return null
. But the end result of those queries needs to be the exepected one.
We do have a multi-node
qa project where this could potentially be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to keep the rule, I think this is a better solution over the existing one.
@@ -111,6 +111,8 @@ public Object fold() { | |||
|
|||
@Override | |||
public ExpressionEvaluator.Factory toEvaluator(Function<Expression, ExpressionEvaluator.Factory> toEvaluator) { | |||
// force datatype initialization | |||
var dataType = dataType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why this wasn't needed so far.
Nit: maybe just calling the function ignoring the return and not shadowing the instance variable might be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug - the node was mutated but the new instance did not have its dataType()) called.
For example, the node was serialized, deserialized than the execution kicked in - since nobody called dataType(), the value was not initialized throwing a NPE - see the CI failures.
|
||
java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.resolve(r, r); | ||
|
||
// collect aliases bottom-up | ||
plan.forEachExpressionUp(Alias.class, a -> { | ||
var c = a.child(); | ||
if (c.foldable()) { | ||
collectRefs.put(a.toAttribute(), c); | ||
boolean shouldCollect = c.foldable(); | ||
// try to resolve the expression based on an existing foldables | ||
if (shouldCollect == false) { | ||
c = c.transformUp(ReferenceAttribute.class, replaceReference); | ||
shouldCollect = c.foldable(); | ||
} | ||
if (shouldCollect) { | ||
collectRefs.put(a.toAttribute(), Literal.of(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same ^^. Is it to accelerate the resolution? Wouldn't the first c.foldable()
eventually return true?
Fix subtle error in LocalPhysicalPlanOptimizer test Add more unit tests
// the real execution breaks the plan at the exchange and then decouples the plan | ||
// this is of no use in the unit tests, which checks the plan as a whole instead of each | ||
// individually hence why here the plan is kept as is | ||
|
||
var logicalTestOptimizer = new LocalLogicalPlanOptimizer(new LocalLogicalOptimizerContext(config, searchStats)); | ||
var physicalTestOptimizer = new TestLocalPhysicalPlanOptimizer(new LocalPhysicalOptimizerContext(config, searchStats), true); | ||
var l = PlannerUtils.localPlan(plan, logicalTestOptimizer, physicalTestOptimizer); | ||
var l = PlannerUtils.localPlan(physicalPlan, logicalTestOptimizer, physicalTestOptimizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug that caused a lot of head scratching - the other tests where not failing since the local plan was not influenced for them.
💔 Backport failed
You can use sqren/backport to manually backport by running |
…lastic#104586) Improve ReplaceMissingFieldWithNull to create just one eval (per datatype) for the missing value and have the rest point to it. This reduces the amount of EvalOperators created in the pipeline. Preserve type information (one null eval per dataType) Fix subtle error in LocalPhysicalPlanOptimizer test Fix elastic#104583 (cherry picked from commit d6f900c)
…104586) (#104669) Improve ReplaceMissingFieldWithNull to create just one eval (per datatype) for the missing value and have the rest point to it. This reduces the amount of EvalOperators created in the pipeline. Preserve type information (one null eval per dataType) Fix subtle error in LocalPhysicalPlanOptimizer test Fix #104583 (cherry picked from commit d6f900c)
Improve ReplaceMissingFieldWithNull to create just one eval for the
missing value and have the rest point to it. This reduces the amount
of EvalOperators created in the pipeline.
Fix #104583