Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplaceMissingFieldWithNull rule lead to huge number of eval operators #104583

Closed
dnhatn opened this issue Jan 19, 2024 · 8 comments · Fixed by #104586
Closed

ReplaceMissingFieldWithNull rule lead to huge number of eval operators #104583

dnhatn opened this issue Jan 19, 2024 · 8 comments · Fixed by #104586
Assignees
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@dnhatn
Copy link
Member

dnhatn commented Jan 19, 2024

There is an OOM report from a simple query FROM some-index | LIMIT 10. I found in the heap dump that some Drivers has more 16K eval operators. I believe the rule ReplaceMissingFieldWithNull could translate missing fields to an Eval. This lead to many eval operators if the target indices have many missing fields. Since 8.12, ValuesSourceReaderOperator can now read multiple fields at the same time. I think we need to combine these Literals in a single EvalOperator like we do with ValuesSourceReaderOperator.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented Jan 19, 2024

And the field loading infrastructure detects missing fields and loads them with null without a lot of ceremony either.

@dnhatn
Copy link
Member Author

dnhatn commented Jan 19, 2024

@nik9000 Do you mean that we should just remove that rule and have the ValuesSourceReaderOperator take care of this instead?

@costin costin self-assigned this Jan 19, 2024
@nik9000
Copy link
Member

nik9000 commented Jan 20, 2024

Maybe so! I don't have the code in front of me and I think it'd be useful to make sure we can merge the reads into a single operator run. But presuming we can do that I think that's fine.

costin added a commit to costin/elasticsearch that referenced this issue Jan 20, 2024
Improve ReplaceMissingFieldWithNull to create just one eval for the
 missing value and have the rest point to it. This reduces the amount
 of EvalOperators created in the pipeline.

Fix elastic#104583
@alex-spies
Copy link
Contributor

@nik9000 Do you mean that we should just remove that rule and have the ValuesSourceReaderOperator take care of this instead?

If that's easily possible, I think that'd be the best way to go about this.
#104586 fixes the problem at hand, so I think it's no problem to merge it - but removing ReplaceMissingFieldWithNull would lead to a less complex solution IMHO.

@nik9000
Copy link
Member

nik9000 commented Jan 22, 2024

Go ahead and merge, yeah. That get's us out of trouble. But it'd be nice to have a good story around why it's better to do this with eval. The readers are indeed pretty good at this.

@astefan
Copy link
Contributor

astefan commented Jan 22, 2024

The presence of the eval can simplify some queries and transform them in a local relation (not even reaching the Lucene level). I don't think it's so easy to determine if the presence of the rule is more beneficial than just having good readers. This might be one of those "features" that can prove their usefulness (or not) with time and edge cases. The logic behind it is pretty strong and I wouldn't discard it without a very good reason.

@luigidellaquila
Copy link
Contributor

luigidellaquila commented Jan 22, 2024

My 2 cents, optimizations tend to increase complexity: every optimization is one more piece of code to maintain, one more moving part that can interact/overlap/clash with other components, and a potential source of bugs.
Probably this specific case is particularly unlucky, because all the three above seem to happen at the same time.
I think any optimization we have in the code should be justified by a quantitative analysis of the benefits it brings (eg. in terms of saved query execution time in common scenarios). If the performance benefits are not measurable, then it's better to avoid it.

costin added a commit that referenced this issue Jan 24, 2024
…104586)

Improve ReplaceMissingFieldWithNull to create just one eval (per
 datatype) for the missing value and have the rest point to it. This
 reduces the amount of EvalOperators created in the pipeline.
Preserve type information (one null eval per dataType)
Fix subtle error in LocalPhysicalPlanOptimizer test

Fix #104583
costin added a commit to costin/elasticsearch that referenced this issue Jan 24, 2024
…lastic#104586)

Improve ReplaceMissingFieldWithNull to create just one eval (per
 datatype) for the missing value and have the rest point to it. This
 reduces the amount of EvalOperators created in the pipeline.
Preserve type information (one null eval per dataType)
Fix subtle error in LocalPhysicalPlanOptimizer test

Fix elastic#104583

(cherry picked from commit d6f900c)
elasticsearchmachine pushed a commit that referenced this issue Jan 24, 2024
…104586) (#104669)

Improve ReplaceMissingFieldWithNull to create just one eval (per
 datatype) for the missing value and have the rest point to it. This
 reduces the amount of EvalOperators created in the pipeline.
Preserve type information (one null eval per dataType)
Fix subtle error in LocalPhysicalPlanOptimizer test

Fix #104583

(cherry picked from commit d6f900c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants