Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the state of search requests to the coordinator node #46523

Closed
8 tasks done
jimczi opened this issue Sep 10, 2019 · 2 comments
Closed
8 tasks done

Move the state of search requests to the coordinator node #46523

jimczi opened this issue Sep 10, 2019 · 2 comments
Assignees
Labels
Meta :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jimczi
Copy link
Contributor

jimczi commented Sep 10, 2019

This is a meta issue to track the tasks to move the state of a search in the coordinator node.
Today the initial phase of any search creates a SearchContext on each node that contains a shard selected for the request. This SearchContext is then used as a state on each shard for the subsequent phases. This issue proposes to move from a SearchContext on each shard to a ReaderContext that would keep track of the index reader that should be used for the entire lifecycle of a search request and to move all the state of the search to the coordinating node.
To achieve this we need to re-create the search context for each phase based on the results of the previous phase. The state of the previous phase can be passed through the result of the phase and added to the request of the next one in order to be able to rebuild the search state.
Here is a list (hopefully exhaustive) of the tasks that need to be done to achieve this:

There are plenty of follow ups that we could do once we move the state of the request to the coordinator node. For instance we could create a single reader context per directory reader based on sequence numbers that we could check when a replica fails in order to move the search to a different node if another replica is at the same checkpoint (always the case for read-only indices/frozen indices).

Closes #26472

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories Meta labels Sep 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 10, 2019
…ator factories

This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories.
Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and
`AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.

Relates elastic#46523
jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 10, 2019
…sing context

This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext
Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one.

Relates elastic#46523
jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 11, 2019
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of elastic#46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates elastic#46523
jimczi added a commit that referenced this issue Sep 11, 2019
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
jimczi added a commit that referenced this issue Sep 11, 2019
…sing context (#46543)

This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext
Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one.

Relates #46523
jimczi added a commit that referenced this issue Sep 11, 2019
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
jimczi added a commit that referenced this issue Sep 11, 2019
…sing context (#46543)

This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext
Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one.

Relates #46523
jimczi added a commit that referenced this issue Sep 11, 2019
…ator factories (#46527)

This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.

Relates #46523
jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 11, 2019
This change delays the creation of the SubSearchContext for nested and parent/child inner_hits
to the fetch sub phase in order to ensure that a SearchContext can built entirely from a
QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures
that we fail the request early if the inner hits path is invalid.

Relates elastic#46523
jimczi added a commit that referenced this issue Sep 11, 2019
…ator factories (#46527)

This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.

Relates #46523
@jimczi jimczi added the 7x label Sep 12, 2019
jimczi added a commit that referenced this issue Sep 12, 2019
This change delays the creation of the SubSearchContext for nested and parent/child inner_hits
to the fetch sub phase in order to ensure that a SearchContext can built entirely from a
QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid.

Relates #46523
jimczi added a commit that referenced this issue Sep 12, 2019
This change delays the creation of the SubSearchContext for nested and parent/child inner_hits
to the fetch sub phase in order to ensure that a SearchContext can built entirely from a
QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid.

Relates #46523
jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 23, 2019
This commit removes the SearchContextException in favor of a simpler
SearchException that doesn't leak the SearchContext.

Relates elastic#46523
jimczi added a commit to jimczi/elasticsearch that referenced this issue Sep 23, 2019
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.

Relates elastic#46523
jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 8, 2019
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.

Relates elastic#47198
Relates elastic#46523
jimczi added a commit that referenced this issue Oct 9, 2019
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.

Relates #47198
Relates #46523
jimczi added a commit that referenced this issue Oct 10, 2019
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.

Relates #47198
Relates #46523
@polyfractal polyfractal removed the 7x label Dec 12, 2019
dnhatn added a commit that referenced this issue Jan 27, 2020
With this change, we partially move the state of SearchContext to 
ReaderContext. This is another step allowing us to move the state of
search to the coordinating node.

We will need several follow-ups to move the entire search state to the 
coordinating node.

Relates #46523
dnhatn added a commit that referenced this issue Feb 22, 2020
With this change, we partially move the state of SearchContext to
ReaderContext. This is another step allowing us to move the state of
search to the coordinating node.

We will need several follow-ups to move the entire search state to the
coordinating node.

Relates #46523
dnhatn added a commit that referenced this issue Mar 3, 2020
This commit moves the states of search to the coordinating node instead 
of keeping them in the data node. 

Relates #46523
@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
dnhatn added a commit that referenced this issue Aug 25, 2020
This commit introduces a new API that manages point-in-times in x-pack 
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The 
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be 
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is 
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <[email protected]>
@jimczi
Copy link
Contributor Author

jimczi commented Sep 8, 2020

The feature was merged in #61062, hence closing.

@jimczi jimczi closed this as completed Sep 8, 2020
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Sep 10, 2020
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: elastic#52741
- Allow searches with a specific reader context: elastic#53989
- Add the ability to acquire readers in IndexShard: elastic#54966

Relates elastic#46523
Relates elastic#26472

Co-authored-by: Jim Ferenczi <[email protected]>
dnhatn added a commit that referenced this issue Sep 10, 2020
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants