-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCS: don't proxy requests for already connected node #31273
CCS: don't proxy requests for already connected node #31273
Conversation
Cross-cluster search selects a subset of nodes for each remote cluster and sends requests only to them, which will act as a proxy and properly redirect such requests to the target nodes that hold the relevant data. What happens today is that every time we send a request to a remote cluster, it will be sent to the next node in the proxy list (in round-robin fashion), regardless of whether the target node is already amongst the ones that we are connected to. In case for instance we need to send a shard search request to a data node that's also one of the selected proxy nodes, we may end up sending the request to it through one of the other proxy nodes. This commit optimizes this case to make sure that whenever we are already connected to a remote node, we will send a direct request rather than using the next proxy node. There is a side-effect to this, which is that round-robin will be a bit unbalanced as the data nodes that are also selected as proxies will receive more requests.
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some suggestions LGTM
*/ | ||
Transport.Connection getConnection(DiscoveryNode remoteClusterNode) { | ||
DiscoveryNode discoveryNode = connectedNodes.get(); | ||
if (connectedNodes.contains(remoteClusterNode)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use transportService .nodeConnected
instead. I mean if we for instance have the local cluster configured as a remote cluster we get this optimization as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, great catch!
|
||
@Override | ||
public void sendRequest(long requestId, String action, TransportRequest request, TransportRequestOptions options) | ||
static class ProxyConnection implements Transport.Connection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the class final?
@@ -612,7 +622,7 @@ int getNumNodesConnected() { | |||
return connectedNodes.size(); | |||
} | |||
|
|||
private static class ConnectedNodes implements Supplier<DiscoveryNode> { | |||
private static class ConnectedNodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also final?
* elastic/master: (40 commits) [DOC] Extend SQL docs Immediately flush channel after writing to buffer (elastic#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (elastic#31249) move security ingest processors to a sub ingest directory (elastic#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (elastic#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (elastic#31299) CCS: don't proxy requests for already connected node (elastic#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (elastic#31298) [Rollup] Metric config parser must use builder so validation runs (elastic#31159) [ML] Check licence when datafeeds use cross cluster search (elastic#31247) Add notion of internal index settings (elastic#31286) Test: Remove broken yml test feature (elastic#31255) REST hl client: cluster health to default to cluster level (elastic#31268) [ML] Update test thresholds to account for changes to memory control (elastic#31289) ...
* elastic/master: (29 commits) [DOC] Extend SQL docs Immediately flush channel after writing to buffer (elastic#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (elastic#31249) move security ingest processors to a sub ingest directory (elastic#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (elastic#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (elastic#31299) CCS: don't proxy requests for already connected node (elastic#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (elastic#31298) [Rollup] Metric config parser must use builder so validation runs (elastic#31159) [ML] Check licence when datafeeds use cross cluster search (elastic#31247) Add notion of internal index settings (elastic#31286) Test: Remove broken yml test feature (elastic#31255) REST hl client: cluster health to default to cluster level (elastic#31268) [ML] Update test thresholds to account for changes to memory control (elastic#31289) ...
* master: Remove RestGetAllAliasesAction (#31308) Temporary fix for broken build Reenable Checkstyle's unused import rule (#31270) Remove remaining unused imports before merging #31270 Fix non-REST doc snippet [DOC] Extend SQL docs Immediately flush channel after writing to buffer (#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (#31249) move security ingest processors to a sub ingest directory (#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) CCS: don't proxy requests for already connected node (#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (#31298) [Rollup] Metric config parser must use builder so validation runs (#31159) [ML] Check licence when datafeeds use cross cluster search (#31247) Add notion of internal index settings (#31286) Test: Remove broken yml test feature (#31255) REST hl client: cluster health to default to cluster level (#31268) [ML] Update test thresholds to account for changes to memory control (#31289) Log warnings when cluster state publication failed to some nodes (#31233) Fix AntFixture waiting condition (#31272) Ignore numeric shard count if waiting for ALL (#31265) [ML] Implement new rules design (#31110) index_prefixes back-compat should test 6.3 (#30951) Core: Remove plain execute method on TransportAction (#30998) Update checkstyle to 8.10.1 (#31269) Set analyzer version in PreBuiltAnalyzerProviderFactory (#31202) Modify pipelining handlers to require full requests (#31280) Revert upgrade to Netty 4.1.25.Final (#31282) Use armored input stream for reading public key (#31229) Fix Netty 4 Server Transport tests. Again. REST hl client: adjust wait_for_active_shards param in cluster health (#31266) REST high-level Client: remove deprecated API methods (#31200) [DOCS] Mark SQL feature as experimental [DOCS] Updates machine learning custom URL screenshots (#31222) Fix naming conventions check for XPackTestCase Fix security Netty 4 transport tests Fix race in clear scroll (#31259) [DOCS] Clarify audit index settings when remote indexing (#30923) Delete typos in SAML docs (#31199) REST high-level client: add Cluster Health API (#29331) [ML][TEST] Mute tests using rules (#31204) Support RequestedAuthnContext (#31238) SyncedFlushResponse to implement ToXContentObject (#31155) Add Get Aliases API to the high-level REST client (#28799) Remove some line length supressions (#31209) Validate xContentType in PutWatchRequest. (#31088) [INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024) Suppress extras FS on caching directory tests Revert "[DOCS] Added 6.3 info & updated the upgrade table. (#30940)" Revert "Fix snippets in upgrade docs" Fix snippets in upgrade docs [DOCS] Added 6.3 info & updated the upgrade table. (#30940) LLClient: Support host selection (#30523) Upgrade to Netty 4.1.25.Final (#31232) Enable custom credentials for core REST tests (#31235) Move ESIndexLevelReplicationTestCase to test framework (#31243) Encapsulate Translog in Engine (#31220) HLRest: Add get index templates API (#31161) Remove all unused imports and fix CRLF (#31207) [Tests] Fix self-referencing tests [TEST] Fix testRecoveryAfterPrimaryPromotion [Docs] Remove mention pattern files in Grok processor (#31170) Use stronger write-once semantics for Azure repository (#30437) Don't swallow exceptions on replication (#31179) Limit the number of concurrent requests per node (#31206) Call ensureNoSelfReferences() on _agg state variable after scripted metric agg script executions (#31044) Move java version checker back to its own jar (#30708) [test] add fix for rare virtualbox error (#31212)
Cross-cluster search selects a subset of nodes for each remote cluster and sends requests only to them, which will act as a proxy and properly redirect such requests to the target nodes that hold the relevant data. What happens today is that every time we send a request to a remote cluster, it will be sent to the next node in the proxy list (in round-robin fashion), regardless of whether the target node is already amongst the ones that we are connected to. In case for instance we need to send a shard search request to a data node that's also one of the selected proxy nodes, we may end up sending the request to it through one of the other proxy nodes. This commit optimizes this case to make sure that whenever we are already connected to a remote node, we will send a direct request rather than using the next proxy node. There is a side-effect to this, which is that round-robin will be a bit unbalanced as the data nodes that are also selected as proxies will receive more requests.
* 6.x: Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) [ML] Implement new rules design (#31110) (#31294) Remove RestGetAllAliasesAction (#31308) CCS: don't proxy requests for already connected node (#31273) Rankeval: Fold template test project into main module (#31203) [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359) More detailed tracing when writing metadata (#31319) Add details section for dcg ranking metric (#31177)
Cross-cluster search selects a subset of nodes for each remote cluster
and sends requests only to them, which will act as a proxy and properly
redirect such requests to the target nodes that hold the relevant data.
What happens today is that every time we send a request to a remote
cluster, it will be sent to the next node in the proxy list
(in round-robin fashion), regardless of whether the target node is
already amongst the ones that we are connected to. In case for instance
we need to send a shard search request to a data node that's also one of
the selected proxy nodes, we may end up sending the request to it
through one of the other proxy nodes.
This commit optimizes this case to make sure that whenever we are
already connected to a remote node, we will send a direct request rather
than using the next proxy node.
There is a side-effect to this, which is that round-robin will be a bit
unbalanced as the data nodes that are also selected as proxies will
receive more requests.