Move the state of search requests to the coordinator node #46523

jimczi · 2019-09-10T08:38:57Z

This is a meta issue to track the tasks to move the state of a search in the coordinator node.
Today the initial phase of any search creates a SearchContext on each node that contains a shard selected for the request. This SearchContext is then used as a state on each shard for the subsequent phases. This issue proposes to move from a SearchContext on each shard to a ReaderContext that would keep track of the index reader that should be used for the entire lifecycle of a search request and to move all the state of the search to the coordinating node.
To achieve this we need to re-create the search context for each phase based on the results of the previous phase. The state of the previous phase can be passed through the result of the phase and added to the request of the next one in order to be able to rebuild the search state.
Here is a list (hopefully exhaustive) of the tasks that need to be done to achieve this:

Replace the SearchContext with QueryShardContext when building aggregator factories (Replace the SearchContext with QueryShardContext when building aggregator factories #46527)
Replace the SearchContext with QueryShardContext when building collapse context (Replace the SearchContext with QueryShardContext when building collapsing context #46543)
Delay the creation of inner hits's subcontext to the fetch subphase and add a validation when creating the inner hits builder (Delay the creation of SubSearchContext to the FetchSubPhase #46598)
Cleanup SubSearchContext and add a way to clone a SearchContext
Immutable SearchContext that can be fully built from a ShardSearchRequest
Add the rewritten ShardSearchRequest to QuerySearchResult and ShardFetchRequest(+bwc).
Re-create the SearchContext on each phase in the SearchService and register a simple ReaderContext in the initial phase that can be used to build the SearchContext in the subsequent phase. The bwc layer must handle nodes in previous versions and scrolls so a special context could be used to reference old style requests.
Add a way to create and delete a ReaderContext that can be used as a point in time reader for multiple search requests.

There are plenty of follow ups that we could do once we move the state of the request to the coordinator node. For instance we could create a single reader context per directory reader based on sequence numbers that we could check when a replica fails in order to move the search to a different node if another replica is at the same checkpoint (always the case for read-only indices/frozen indices).

Closes #26472

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-09-10T08:38:59Z

Pinging @elastic/es-search

…ator factories This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them. The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`. Relates elastic#46523

…sing context This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one. Relates elastic#46523

This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext. It's a spin off of elastic#46527 as this change is required to allow aggregation builder to solely use the query shard context. Relates elastic#46523

This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext. It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the query shard context. Relates #46523

…sing context (#46543) This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one. Relates #46523

This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext. It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the query shard context. Relates #46523

…sing context (#46543) This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one. Relates #46523

…ator factories (#46527) This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them. The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`. Relates #46523

This change delays the creation of the SubSearchContext for nested and parent/child inner_hits to the fetch sub phase in order to ensure that a SearchContext can built entirely from a QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid. Relates elastic#46523

…ator factories (#46527) This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them. The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`. Relates #46523

This change delays the creation of the SubSearchContext for nested and parent/child inner_hits to the fetch sub phase in order to ensure that a SearchContext can built entirely from a QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid. Relates #46523

This commit removes the SearchContextException in favor of a simpler SearchException that doesn't leak the SearchContext. Relates elastic#46523

This commit replaces the SearchContext used in AbstractQueryTestCase with a QueryShardContext in order to reduce the visibility of search contexts. Relates elastic#46523

Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates elastic#47198 Relates elastic#46523

Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523

With this change, we partially move the state of SearchContext to ReaderContext. This is another step allowing us to move the state of search to the coordinating node. We will need several follow-ups to move the entire search state to the coordinating node. Relates #46523

This commit moves the states of search to the coordinating node instead of keeping them in the data node. Relates #46523

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <[email protected]>

jimczi · 2020-09-08T11:05:46Z

The feature was merged in #61062, hence closing.

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: elastic#52741 - Allow searches with a specific reader context: elastic#53989 - Add the ability to acquire readers in IndexShard: elastic#54966 Relates elastic#46523 Relates elastic#26472 Co-authored-by: Jim Ferenczi <[email protected]>

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <[email protected]>

jimczi added :Search/Search Search-related issues that do not fall into other categories Meta labels Sep 10, 2019

jimczi mentioned this issue Sep 10, 2019

Replace the SearchContext with QueryShardContext when building aggregator factories #46527

Merged

jimczi mentioned this issue Sep 10, 2019

Replace the SearchContext with QueryShardContext when building collapsing context #46543

Merged

jimczi mentioned this issue Sep 11, 2019

Add more context to QueryShardContext #46584

Merged

jimczi mentioned this issue Sep 11, 2019

Delay the creation of SubSearchContext to the FetchSubPhase #46598

Merged

jimczi added the 7x label Sep 12, 2019

This was referenced Sep 13, 2019

[ML] Consider using search_after instead of scroll in datafeeds #29781

Open

[ML] Make sort order for datafeeds deterministic #39187

Open

martijnvg mentioned this issue Sep 16, 2019

Prevent fielddata loading for _id #43599

Closed

jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 23, 2019

Replace SearchContextException with SearchException

159f64a

This commit removes the SearchContextException in favor of a simpler SearchException that doesn't leak the SearchContext. Relates elastic#46523

jimczi mentioned this issue Sep 23, 2019

Replace SearchContextException with SearchException #46965

Merged

jimczi mentioned this issue Oct 8, 2019

Remove the SearchContext from the highlighter context #47733

Merged

This was referenced Nov 26, 2019

Supporting EQL in Elasticsearch #49581

Closed

Modify EQL execution to use point in time reader #49628

Closed

$@polyfractal$ polyfractal removed the 7x label Dec 12, 2019

dnhatn assigned jimczi and dnhatn Jan 22, 2020

dnhatn mentioned this issue Jan 22, 2020

Cut over from SearchContext to ReaderContext #51282

Merged

dnhatn mentioned this issue Feb 25, 2020

Move states of search to coordinating node #52741

Merged

dnhatn added a commit that referenced this issue Mar 3, 2020

Move states of search to coordinating node (#52741)

206381e

This commit moves the states of search to the coordinating node instead of keeping them in the data node. Relates #46523

rjernst added the Team:Search Meta label for search team label May 4, 2020

dnhatn mentioned this issue May 9, 2020

Introduce search context - point in time view of indices #56480

Closed

dnhatn mentioned this issue Aug 12, 2020

Introduce point in time APIs in x-pack basic #61062

Merged

dnhatn mentioned this issue Sep 2, 2020

Introduce point in time APIs in x-pack basic #61872

Closed

bpintea mentioned this issue Sep 2, 2020

SQL: replace the scroll with PIT for data batching #61873

Closed

jimczi closed this as completed Sep 8, 2020

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

jakelandis mentioned this issue Dec 2, 2020

Very large scroll search (i.e. reindex) can gradually slow down #65780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move the state of search requests to the coordinator node #46523

Move the state of search requests to the coordinator node #46523

jimczi commented Sep 10, 2019 •

edited

Loading

elasticmachine commented Sep 10, 2019

jimczi commented Sep 8, 2020

Move the state of search requests to the coordinator node #46523

Move the state of search requests to the coordinator node #46523

Comments

jimczi commented Sep 10, 2019 • edited Loading

elasticmachine commented Sep 10, 2019

jimczi commented Sep 8, 2020

jimczi commented Sep 10, 2019 •

edited

Loading