Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.17] Avoid doing I/O when fetching min and max for keyword fields (#92026) #92865

Merged
merged 1 commit into from
Jan 12, 2023

Conversation

javanna
Copy link
Member

@javanna javanna commented Jan 12, 2023

Whenever sorting on a date, numeric or keyword field (as primary sort), the can_match phase retrieves min and max for the field and sorts the shards (asc or desc depending on the sort order) so that they are going to be queried following that order. This allows incremental results to be exposed in that same order when using async search, as well as optimizations built on top of such behaviour (#51852).

For fields with points we call getMinPackedValue and getMaxPackedValue, while for keyword fields we call Terms#getMin and Terms#getMax. Elasticsearch uses FilterTerms implementations to cancel queries as well as to track field usage. Such filter implementations should delegate their getMin and getMax calls to the wrapped Terms instance, which will leverage info from the block tree that caches min and max, otherwise they are always going to be retrieved from the index, which does I/O and slows the can_match phase down.

@javanna javanna added >bug :Search/Search Search-related issues that do not fall into other categories backport v7.17.9 labels Jan 12, 2023
…#92026)

Whenever sorting on a date, numeric or keyword field (as primary sort), the can_match phase retrieves min and max for the field and sorts the shards (asc or desc depending on the sort order) so that they are going to be queried following that order. This allows incremental results to be exposed in that same order when using async search, as well as optimizations built on top of such behaviour (elastic#51852).

For fields with points we call `getMinPackedValue` and `getMaxPackedValue`, while for keyword fields we call `Terms#getMin` and `Terms#getMax`. Elasticsearch uses `FilterTerms` implementations to cancel queries as well as to track field usage. Such filter implementations should delegate their `getMin` and `getMax` calls to the wrapped `Terms` instance, which will leverage info from the block tree that caches min and max, otherwise they are always going to be retrieved from the index, which does I/O and slows the can_match phase down.
@javanna javanna force-pushed the backport/7.17/92026 branch from ea8bb5c to 8242f13 Compare January 12, 2023 11:04
@javanna javanna merged commit 3c88579 into elastic:7.17 Jan 12, 2023
@javanna javanna deleted the backport/7.17/92026 branch January 12, 2023 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport >bug :Search/Search Search-related issues that do not fall into other categories v7.17.9
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant