Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for local cluster alias to SearchRequest #36997

Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Dec 27, 2018

With the upcoming cross-cluster search alternate execution mode, the CCS node will be able to split a CCS request into multiple search requests, one per remote cluster involved. In order to do that, the CCS node has to be able to signal to each remote cluster that such sub-requests are part of a CCS request. Each cluster does not know about the other clusters involved, and does not know either what alias it is given in the CCS node, hence the CCS coordinating node needs to be able to provide the alias as part of the search request so that it is used as index prefix in the returned search hits.

The cluster alias is a notion that's already supported in the search shards iterator and search shard target, but it is currently used in CCS as both index prefix and connection lookup key when fanning out to all the shards. With CCS alternate execution mode the provided cluster alias needs to be used only as index prefix, as shards are local to each cluster hence no cluster alias should be used for connection lookups.

The local cluster alias can be set to the SearchRequest at the transport layer only. A new package private constructor is added that takes the cluster alias as an argument, as well as a package private getter method.

Relates to #32125

With the upcoming cross-cluster search alternate execution mode, the CCS
 node will be able to split a CCS request into multiple search requests,
 one per remote cluster involved. In order to do that, the CCS node has
to be able to signal to each remote cluster that such sub-requests are
part of a CCS request. Each cluster does not know about the other
clusters involved, and does not know either what alias it is given in
the CCS node, hence the CCS coordinating node needs to be able to provide
the alias as part of the search request so that it is used as index prefix
in the returned search hits.

The cluster alias is a notion that's already supported in the search shards
iterator and search shard target, but it is currently used in CCS as both
 index prefix and connection lookup key when fanning out to all the shards.
With CCS alternate execution mode the provided cluster alias needs to be
used only as index prefix, as shards are local to each cluster hence no
cluster alias should be used for connection lookups. Such distinction is
introduced through a new class called CCSInfo that replaces the previous
cluster alias optional string and allows to set the cluster alias as well
as a boolean flag that signals the current CCS execution mode.

The cluster alias can be set to the SearchRequest at the transport layer
only, and its getter/setter methods are package private.

Relates to elastic#32125
@javanna javanna added >enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.7.0 labels Dec 27, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Simplify changes by making connection lookup aware of the search request cluster alias, so there is no need to distinguish between index prefix and connection alias, which means no serialization changes required besides adding support for cluster alias to the search request.
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment regarding naming, LGTM otherwise

@@ -62,6 +62,8 @@
public static final int DEFAULT_PRE_FILTER_SHARD_SIZE = 128;
public static final int DEFAULT_BATCHED_REDUCE_SIZE = 512;

private final String clusterAlias;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to localClusterAlias ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@javanna javanna changed the title Add support for cluster alias to SearchRequest Add support for local cluster alias to SearchRequest Dec 28, 2018
@javanna javanna merged commit 51fe20e into elastic:master Dec 28, 2018
javanna added a commit to javanna/elasticsearch that referenced this pull request Dec 28, 2018
With elastic#36997 we added the ability to provide a cluster alias with a
SearchRequest.

The next step is to disable the final reduction whenever a cluster alias
is provided with the SearchRequest. A cluster alias will be provided
when executing a cross-cluster search request with alternate execution
mode, where each cluster does its own reduction locally. In order for
the CCS node to be able to later perform an additional reduction of the
results, we need to make sure that all the needed info stays available.
This means that terms aggregations can be reduced but not pruned, and
pipeline aggs should not be executed. The final reduction will happen
later in the CCS coordinating node.

Relates to elastic#36997 & elastic#32125
javanna added a commit that referenced this pull request Dec 28, 2018
With #36997 we added the ability to provide a cluster alias with a
SearchRequest.

The next step is to disable the final reduction whenever a cluster alias
is provided with the SearchRequest. A cluster alias will be provided
when executing a cross-cluster search request with alternate execution
mode, where each cluster does its own reduction locally. In order for
the CCS node to be able to later perform an additional reduction of the
results, we need to make sure that all the needed info stays available.
This means that terms aggregations can be reduced but not pruned, and
pipeline aggs should not be executed. The final reduction will happen
later in the CCS coordinating node.

Relates to #36997 & #32125
javanna added a commit to javanna/elasticsearch that referenced this pull request Dec 28, 2018
With elastic#36997 and elastic#37000 we added the ability to provide a cluster alias with a SearchRequest and to perform non-final reduction when a cluster alias is set to the incoming search request.

With CCS soon supporting local reduction on each remote cluster, we also need to prevent shard_size of terms aggs to be adjusted on the data nodes. We may end up querying a single shard per cluster, but later we will have to reduce results coming from multiple cluster hence we still have to collect more buckets than needed to guarantee some level of precision.

This is done by adding the ability to override the shard_size on the coordinating node as part of the rewrite phase. This will be done only
when searching against multiple clusters, meaning when using CCS with alternate execution mode. Setting the shard_size explicitly on the coord node means that it will not be overridden later on the data nodes. The shard_size will still be set on the data nodes like before in all other cases (local search, or CCS with ordinary execution mode).

 Relates to elastic#32125
javanna added a commit to javanna/elasticsearch that referenced this pull request Jan 3, 2019
With elastic#36997 we added support for providing a local cluster alias with a
`SearchRequest`. We intended to make sure that when provided as part of
a search request, the cluster alias would never be used for connection
lookups. Yet due to a bug we would still end up looking up the
connection from the remote ones.

This commit adds a test to make sure that whenever we set the cluster
alias to the `SearchRequest` (which can only be done at transport), such
alias is used as index prefix in the returned hits. No errors are thrown
despite no remote clusters are configured indicating that such alias is
never used for connection look-ups.

Also, we add explicit support for the empty cluster alias when printing
 out index names through `RemoteClusterAware#buildRemoteIndexName`.
In fact we don't want to print out `:index` when the cluster alias is
set to empty string, but rather `index`. Yet, the semantic of empty
string is different compared to `null` as it will still disable final
reduction. This will be used in CCS when searching against remote
clusters as well as the local one, the local one will have empty prefix
yet it will need to disable final reduction so that its results will be
properly merged with the ones coming from the remote clusters.
javanna added a commit that referenced this pull request Jan 4, 2019
With #36997 we added support for providing a local cluster alias with a
`SearchRequest`. We intended to make sure that when provided as part of
a search request, the cluster alias would never be used for connection
lookups. Yet due to a bug we would still end up looking up the
connection from the remote ones.

This commit adds a test to make sure that whenever we set the cluster
alias to the `SearchRequest` (which can only be done at transport), such
alias is used as index prefix in the returned hits. No errors are thrown
despite no remote clusters are configured indicating that such alias is
never used for connection look-ups.

Also, we add explicit support for the empty cluster alias when printing
 out index names through `RemoteClusterAware#buildRemoteIndexName`.
In fact we don't want to print out `:index` when the cluster alias is
set to empty string, but rather `index`. Yet, the semantic of empty
string is different compared to `null` as it will still disable final
reduction. This will be used in CCS when searching against remote
clusters as well as the local one, the local one will have empty prefix
yet it will need to disable final reduction so that its results will be
properly merged with the ones coming from the remote clusters.
javanna added a commit to javanna/elasticsearch that referenced this pull request Jan 7, 2019
javanna added a commit that referenced this pull request Jan 7, 2019
With the upcoming cross-cluster search alternate execution mode, the CCS
 node will be able to split a CCS request into multiple search requests,
 one per remote cluster involved. In order to do that, the CCS node has
to be able to signal to each remote cluster that such sub-requests are
part of a CCS request. Each cluster does not know about the other
clusters involved, and does not know either what alias it is given in
the CCS node, hence the CCS coordinating node needs to be able to provide
the alias as part of the search request so that it is used as index prefix
in the returned search hits.

The cluster alias is a notion that's already supported in the search shards
iterator and search shard target, but it is currently used in CCS as both
 index prefix and connection lookup key when fanning out to all the shards.
With CCS alternate execution mode the provided cluster alias needs to be
used only as index prefix, as shards are local to each cluster hence no
cluster alias should be used for connection lookups.

The local cluster alias can be set to the SearchRequest at the transport layer
only, and its constructor/getter methods are package private.

Relates to #32125
javanna added a commit that referenced this pull request Jan 7, 2019
With #36997 we added the ability to provide a cluster alias with a
SearchRequest.

The next step is to disable the final reduction whenever a cluster alias
is provided with the SearchRequest. A cluster alias will be provided
when executing a cross-cluster search request with alternate execution
mode, where each cluster does its own reduction locally. In order for
the CCS node to be able to later perform an additional reduction of the
results, we need to make sure that all the needed info stays available.
This means that terms aggregations can be reduced but not pruned, and
pipeline aggs should not be executed. The final reduction will happen
later in the CCS coordinating node.

Relates to #36997 & #32125
javanna added a commit that referenced this pull request Jan 7, 2019
With #36997 we added support for providing a local cluster alias with a
`SearchRequest`. We intended to make sure that when provided as part of
a search request, the cluster alias would never be used for connection
lookups. Yet due to a bug we would still end up looking up the
connection from the remote ones.

This commit adds a test to make sure that whenever we set the cluster
alias to the `SearchRequest` (which can only be done at transport), such
alias is used as index prefix in the returned hits. No errors are thrown
despite no remote clusters are configured indicating that such alias is
never used for connection look-ups.

Also, we add explicit support for the empty cluster alias when printing
 out index names through `RemoteClusterAware#buildRemoteIndexName`.
In fact we don't want to print out `:index` when the cluster alias is
set to empty string, but rather `index`. Yet, the semantic of empty
string is different compared to `null` as it will still disable final
reduction. This will be used in CCS when searching against remote
clusters as well as the local one, the local one will have empty prefix
yet it will need to disable final reduction so that its results will be
properly merged with the ones coming from the remote clusters.
javanna added a commit that referenced this pull request Jan 7, 2019
Version needs to be updated after backporting #36997 & #37142
where we added support for providing and serializing localClusterAlias
as well ass absoluteStartMillis.

Relates to #36997 & #37142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants