Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always rewrite search shard request outside of the search thread pool #51708

Merged
merged 12 commits into from
Feb 6, 2020

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Jan 30, 2020

This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards).

Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.

This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.

This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener
when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool
entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes
don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty
response while the other can return null ones. This is needed since creating true empty responses on shards require to create
concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation
builders in a follow up but that would be a much bigger change.
This change is also important for elastic#49601 since we want to add the ability to use the result of other shards to rewrite the request
of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass
to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided
one.
@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.7.0 labels Jan 30, 2020
@jimczi jimczi requested review from jpountz and dnhatn January 30, 2020 21:55
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this PR, and it looks great. Thanks, Jim. I left a question to discuss.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nits, LGTM.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this change could help a lot in certain cases.

@jimczi jimczi merged commit eb69c6f into elastic:master Feb 6, 2020
@jimczi jimczi deleted the rewrite_shard_request_no_rejection branch February 6, 2020 07:55
@jimczi
Copy link
Contributor Author

jimczi commented Feb 6, 2020

Thanks @dnhatn and @jpountz !

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Feb 6, 2020
…elastic#51708)

This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.
This change is also important for elastic#49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.
jimczi added a commit that referenced this pull request Feb 6, 2020
…#51708) (#51979)

This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.
This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.
jimczi added a commit that referenced this pull request Feb 6, 2020
jimczi added a commit that referenced this pull request Feb 6, 2020
jimczi added a commit that referenced this pull request Feb 6, 2020
dnhatn added a commit that referenced this pull request Feb 8, 2020
We need to either exclude null responses from the scroll search response 
or always create a search context for every target shards, although that
scroll query can be written to match_no_docs. Otherwise, we won't find
search_context for subsequent scroll requests.

This commit implements the latter option as it's less error-prone.

Relates #51708
dnhatn added a commit that referenced this pull request Feb 8, 2020
We need to either exclude null responses from the scroll search response
or always create a search context for every target shards, although that
scroll query can be written to match_no_docs. Otherwise, we won't find
search_context for subsequent scroll requests.

This commit implements the latter option as it's less error-prone.

Relates #51708
dnhatn added a commit that referenced this pull request Feb 10, 2020
We might leak a searcher if the target shard is removed (i.e., its index 
is deleted) or relocated while we are creating a SearchContext from a
SearchRewriteContext.

Relates #51708
Closes #52021

I labelled this non-issue for an unreleased bug introduced in #51708.
dnhatn added a commit that referenced this pull request Feb 10, 2020
We might leak a searcher if the target shard is removed (i.e., its index
is deleted) or relocated while we are creating a SearchContext from a
SearchRewriteContext.

Relates #51708
Closes #52021

I labelled this non-issue for an unreleased bug introduced in #51708.
jimczi added a commit that referenced this pull request Mar 17, 2020
This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents.

This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.
For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none.
This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.
jimczi added a commit that referenced this pull request Mar 18, 2020
This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents.

This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.
For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none.
This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.
ywelsch added a commit that referenced this pull request Jan 13, 2022
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between
FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case
would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail.

This is a bug that was indirectly introduced by #78988.

Closes #82044
ywelsch added a commit to ywelsch/elasticsearch that referenced this pull request Jan 13, 2022
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between
FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case
would result in extra load on the frozen tier, and in the extreme case (in interaction with elastic#51708) made searches fail.

This is a bug that was indirectly introduced by elastic#78988.

Closes elastic#82044
ywelsch added a commit to ywelsch/elasticsearch that referenced this pull request Jan 13, 2022
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between
FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case
would result in extra load on the frozen tier, and in the extreme case (in interaction with elastic#51708) made searches fail.

This is a bug that was indirectly introduced by elastic#78988.

Closes elastic#82044
ywelsch added a commit that referenced this pull request Jan 14, 2022
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between
FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case
would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail.

This is a bug that was indirectly introduced by #78988.

Closes #82044
ywelsch added a commit that referenced this pull request Jan 14, 2022
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between
FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case
would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail.

This is a bug that was indirectly introduced by #78988.

Closes #82044

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement :Search/Search Search-related issues that do not fall into other categories v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants