Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Visit stored fields once #101322

Closed
nik9000 opened this issue Oct 25, 2023 · 7 comments
Closed

ESQL: Visit stored fields once #101322

nik9000 opened this issue Oct 25, 2023 · 7 comments
Assignees
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:QL (Deprecated) Meta label for query languages team
Milestone

Comments

@nik9000
Copy link
Member

nik9000 commented Oct 25, 2023

Description

ESQL loads stored fields like the are doc values. The interface assumes that it can load fields "column-wise". But stored fields are a "row-wise" store! So ESQL loads stored fields by visiting each row once per value. That's terribly slow because the "visiting" involves decompressing whole blocks of values with a dictionary.

We could make this so so so much faster if we loaded stored fields in their own operator with their own interface we could load them row-wise. Like, visit each document one time! Just like the fetch phase. We could do similar things for synthetic _source too one day!

@nik9000 nik9000 added >enhancement needs:triage Requires assignment of a team area label :Analytics/ES|QL AKA ESQL labels Oct 25, 2023
@elasticsearchmachine elasticsearchmachine added Team:QL (Deprecated) Meta label for query languages team and removed needs:triage Requires assignment of a team area label labels Oct 25, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@nik9000
Copy link
Member Author

nik9000 commented Oct 25, 2023

The performance we get from this varies vastly depending on how many stored fields you load. If you load none it obviously won't do anything. If you load two this'll make ESQL like twice as fast. Or, like, 1.8 times as fast. Visiting stored fields is, compared to most of what we do, super duper slow!

@nik9000
Copy link
Member Author

nik9000 commented Oct 31, 2023

Some complexities here - our field loading infrastructure is capable of loading fields from several indices at once. It has a "fast path" that it can follow when we load a single index in ascending order. It has a "slow path" that it can use when it loads from either more than one index or when the fields aren't in order. The tricky bit is the slow path. I think the simplest way to deal with this may be to separate the code for loading from the slow path and the fast path. Or, rather, I've tried not doing that and it is a snarly mess. I'll try the separation the next chance I get.

@nik9000
Copy link
Member Author

nik9000 commented Nov 15, 2023

I've made a PR (#102192) that teaches the field loading infrastructure to load many fields at once which we can use to bunch up our visits to stored fields. I'll need some help on the planner side to modify the plan to do the actual bunching.

@luigidellaquila
Copy link
Contributor

Happy to help on the planner side, let's take it off-line

@costin costin added this to the 8.12 milestone Nov 16, 2023
pull bot pushed a commit to Samboski1/elasticsearch that referenced this issue Nov 20, 2023
This modifies ESQL to load a list of fields at one time which is especially
effective when loading from stored fields or _source because it allows
visiting the stored fields one time.

Part of elastic#101322
@nik9000
Copy link
Member Author

nik9000 commented Nov 21, 2023

#102408 got it.

@nik9000 nik9000 closed this as completed Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

No branches or pull requests

4 participants