Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Reduce the number of Evals ReplaceMissingFieldWithNull creates #104586

Merged
merged 7 commits into from
Jan 24, 2024

Conversation

costin
Copy link
Member

@costin costin commented Jan 20, 2024

Improve ReplaceMissingFieldWithNull to create just one eval for the
missing value and have the rest point to it. This reduces the amount
of EvalOperators created in the pipeline.

Fix #104583

Improve ReplaceMissingFieldWithNull to create just one eval for the
 missing value and have the rest point to it. This reduces the amount
 of EvalOperators created in the pipeline.

Fix elastic#104583
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 20, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@costin costin changed the title Reduce the number of Evals ReplaceMissingFieldWithNull creates ESQL: Reduce the number of Evals ReplaceMissingFieldWithNull creates Jan 22, 2024
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works and I couldn't find any queries that would break this. LGTM!

Comment on lines +409 to +422

java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.resolve(r, r);

// collect aliases bottom-up
plan.forEachExpressionUp(Alias.class, a -> {
var c = a.child();
if (c.foldable()) {
collectRefs.put(a.toAttribute(), c);
boolean shouldCollect = c.foldable();
// try to resolve the expression based on an existing foldables
if (shouldCollect == false) {
c = c.transformUp(ReferenceAttribute.class, replaceReference);
shouldCollect = c.foldable();
}
if (shouldCollect) {
collectRefs.put(a.toAttribute(), Literal.of(c));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we need to make changes to PropagateEvalFoldables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ^^. Is it to accelerate the resolution? Wouldn't the first c.foldable() eventually return true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention this - while running debugging I've noticed this rule is doing only one transform per run causing the optimizer to run multiple times. So I've modified it so it can do the replacement in one go instead of multiple method calls.
An example is this:

eval x = 1
eval y = x
eval z = x + 3
eval w = z + y

Previously the rule had to run 4 times, now it does it in the first run.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reproduce the behavior reported in the original issue and, given the description of the issue and the one of the ReplaceMissingFieldWithNull rule, I failed to repro.

Debugging the code on a single debug-mode node showed that "missing" means "the field doesn't exist in the mapping" of an index, which is different than (what I originally thought) missing from the documents.

This means that the reproduction step involves at least two nodes, one index having its shards (probably only one) assigned to one node, the other index having the shard(s) on the second node. Also, one index has to have some fields that the other doesn't.
Given this, I would adjust the description of the rule to be a bit more exact and potentially add one-two comments in the code of the rule as well, to make sure there are no confusions about what "missing" means.

I would love to see an IT test with this one in action since it's changing the way the projected fields are treated (now having multiple aliases pointing to the same field = null expression). And the whole point of this rule is to help expedite queries that can end up not actually touching the indices. This means a query like from test* | project field3 with field3 "missing" from one of the indices would not even touch those shards on potentially several nodes and just return null. But the end result of those queries needs to be the exepected one.

We do have a multi-node qa project where this could potentially be tested.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to keep the rule, I think this is a better solution over the existing one.

@@ -111,6 +111,8 @@ public Object fold() {

@Override
public ExpressionEvaluator.Factory toEvaluator(Function<Expression, ExpressionEvaluator.Factory> toEvaluator) {
// force datatype initialization
var dataType = dataType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why this wasn't needed so far.
Nit: maybe just calling the function ignoring the return and not shadowing the instance variable might be cleaner.

Copy link
Member Author

@costin costin Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug - the node was mutated but the new instance did not have its dataType()) called.
For example, the node was serialized, deserialized than the execution kicked in - since nobody called dataType(), the value was not initialized throwing a NPE - see the CI failures.

Comment on lines +409 to +422

java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.resolve(r, r);

// collect aliases bottom-up
plan.forEachExpressionUp(Alias.class, a -> {
var c = a.child();
if (c.foldable()) {
collectRefs.put(a.toAttribute(), c);
boolean shouldCollect = c.foldable();
// try to resolve the expression based on an existing foldables
if (shouldCollect == false) {
c = c.transformUp(ReferenceAttribute.class, replaceReference);
shouldCollect = c.foldable();
}
if (shouldCollect) {
collectRefs.put(a.toAttribute(), Literal.of(c));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ^^. Is it to accelerate the resolution? Wouldn't the first c.foldable() eventually return true?

@costin costin requested a review from fang-xing-esql January 23, 2024 01:23
// the real execution breaks the plan at the exchange and then decouples the plan
// this is of no use in the unit tests, which checks the plan as a whole instead of each
// individually hence why here the plan is kept as is

var logicalTestOptimizer = new LocalLogicalPlanOptimizer(new LocalLogicalOptimizerContext(config, searchStats));
var physicalTestOptimizer = new TestLocalPhysicalPlanOptimizer(new LocalPhysicalOptimizerContext(config, searchStats), true);
var l = PlannerUtils.localPlan(plan, logicalTestOptimizer, physicalTestOptimizer);
var l = PlannerUtils.localPlan(physicalPlan, logicalTestOptimizer, physicalTestOptimizer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug that caused a lot of head scratching - the other tests where not failing since the local plan was not influenced for them.

@costin costin merged commit d6f900c into elastic:main Jan 24, 2024
15 checks passed
@costin costin deleted the fix/104583 branch January 24, 2024 00:40
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.12 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 104586

costin added a commit to costin/elasticsearch that referenced this pull request Jan 24, 2024
…lastic#104586)

Improve ReplaceMissingFieldWithNull to create just one eval (per
 datatype) for the missing value and have the rest point to it. This
 reduces the amount of EvalOperators created in the pipeline.
Preserve type information (one null eval per dataType)
Fix subtle error in LocalPhysicalPlanOptimizer test

Fix elastic#104583

(cherry picked from commit d6f900c)
elasticsearchmachine pushed a commit that referenced this pull request Jan 24, 2024
…104586) (#104669)

Improve ReplaceMissingFieldWithNull to create just one eval (per
 datatype) for the missing value and have the rest point to it. This
 reduces the amount of EvalOperators created in the pipeline.
Preserve type information (one null eval per dataType)
Fix subtle error in LocalPhysicalPlanOptimizer test

Fix #104583

(cherry picked from commit d6f900c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.12.1 v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReplaceMissingFieldWithNull rule lead to huge number of eval operators
5 participants