-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure we don't use a remote profile if cluster name matches #31331
Conversation
If we are running into a race condition between a node being configured to be a remote node for cross cluster search etc. and that node joining the cluster we might connect to that node with a remote profile. If that node now joins the cluster it connected to it as a CCS remote node we use the wrong profile and can't use bulk connections etc. anymore. This change uses the remote profile only if we connect to a node that has a different cluster name than the local cluster. This is not a perfect fix for this situation but is the safe option while potentially only loose a small optimization of using less connections per node which is small anyways since we only connect to a small set of nodes. Closes elastic#29321
Pinging @elastic/es-core-infra |
private ConnectionProfile getRemoteProfile(ClusterName name) { | ||
// we can only compare the cluster name to make a decision if we should use a remote profile | ||
// we can't use a cluster UUID here since we could be connecting to that remote cluster before | ||
// the remote node has joined it cluster and have a cluster UUID. The fact that we just loose a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/it cluster/its cluster also s/loose/lose
// we can only compare the cluster name to make a decision if we should use a remote profile | ||
// we can't use a cluster UUID here since we could be connecting to that remote cluster before | ||
// the remote node has joined it cluster and have a cluster UUID. The fact that we just loose a | ||
// rather smallish optimization on the connection layer under certain situations were remote clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/were/where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this @s1monw thanks, would it be possible to add a test, it seems like the issue that's getting fixed is quite nasty.
@javanna I pushed some tests I don't think I can integration test this in a sane way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I mainly have a question about connecting with a null connection profile vs. not connecting at all and letting the normal cluster formation process create the connection.
if (nodePredicate.test(handshakeNode) && connectedNodes.size() < maxNumRemoteConnections) { | ||
transportService.connectToNode(handshakeNode, remoteProfile); | ||
transportService.connectToNode(handshakeNode, getRemoteProfile(handshakeResponse.getClusterName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my main question is, do we want to connect? I guess I don't see any functional difference. It just seems kind of weird that a "normal" cluster node connection is initiated from the RemoteClusterConnection
and not the normal cluster formation work. And would be a catch in the future if we ever made a change to how normal cluster connections work.
Is there maybe a different race condition concern if we do not initiate the connection here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I think if that cluster has coincidently the same name we would never instantiate this connection? If we don't connect, when would we connect to it? I can think of way more complicated things like waiting for the remote cluster to be formed etc. and then compare cluster UUIDs but that would require the local cluster to be formed and the remote one which is a complicating many many things. I think we can investigate this going forward ie. to only start connections once we have formed a cluster and never establish a connection to a node that isn't part of a cluster but this would be too complex to backport and this is a bug I want to be fixed in 5.6.x. I will create a followup issue for this and look into it what it takes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. That makes sense.
If we are running into a race condition between a node being configured to be a remote node for cross cluster search etc. and that node joining the cluster we might connect to that node with a remote profile. If that node now joins the cluster it connected to it as a CCS remote node we use the wrong profile and can't use bulk connections etc. anymore. This change uses the remote profile only if we connect to a node that has a different cluster name than the local cluster. This is not a perfect fix for this situation but is the safe option while potentially only loose a small optimization of using less connections per node which is small anyways since we only connect to a small set of nodes. Closes #29321
If we are running into a race condition between a node being configured to be a remote node for cross cluster search etc. and that node joining the cluster we might connect to that node with a remote profile. If that node now joins the cluster it connected to it as a CCS remote node we use the wrong profile and can't use bulk connections etc. anymore. This change uses the remote profile only if we connect to a node that has a different cluster name than the local cluster. This is not a perfect fix for this situation but is the safe option while potentially only loose a small optimization of using less connections per node which is small anyways since we only connect to a small set of nodes. Closes #29321
…#31331) If we are running into a race condition between a node being configured to be a remote node for cross cluster search etc. and that node joining the cluster we might connect to that node with a remote profile. If that node now joins the cluster it connected to it as a CCS remote node we use the wrong profile and can't use bulk connections etc. anymore. This change uses the remote profile only if we connect to a node that has a different cluster name than the local cluster. This is not a perfect fix for this situation but is the safe option while potentially only loose a small optimization of using less connections per node which is small anyways since we only connect to a small set of nodes. Closes elastic#29321
If we are running into a race condition between a node being configured to be a remote node for cross cluster search etc. and that node joining the cluster we might connect to that node with a remote profile. If that node now joins the cluster it connected to it as a CCS remote node we use the wrong profile and can't use bulk connections etc. anymore. This change uses the remote profile only if we connect to a node that has a different cluster name than the local cluster. This is not a perfect fix for this situation but is the safe option while potentially only loose a small optimization of using less connections per node which is small anyways since we only connect to a small set of nodes. Closes #29321
This change prevent remote cluster connections to be established to nodes that have not yet joined a cluster and don't have a cluster UUID. This allows to effectivly detect nodes that are part of the local cluster. To compare the local cluster UUID to the remote nodes cluster UUID we need to wait until we recovered a state and a master is elected before we can connect to remote clusters. Relates to elastic#31331
* 6.x: Add get stored script and delete stored script to high level REST API Increasing skip version for failing test on 6.x Skip get_alias tests for 5.x (#31397) Fix defaults in GeoShapeFieldMapper output (#31302) Test: better error message on failure Mute DefaultShardsIT#testDefaultShards test Fix reference to XContentBuilder.string() (#31337) [DOCS] Adds monitoring breaking change (#31369) [DOCS] Adds security breaking change (#31375) [DOCS] Backports breaking change (#31373) RestAPI: Reject forcemerge requests with a body (#30792) Docs: Use the default distribution to test docs (#31251) Use system context for cluster state update tasks (#31241) [DOCS] Adds testing for security APIs (#31345) [DOCS] Removes ML item from release highlights [DOCS] Removes breaking change (#31376) REST high-level client: add validate query API (#31077) Move language analyzers from server to analysis-common module. (#31300) Expose lucene's RemoveDuplicatesTokenFilter (#31275) [Test] Fix :example-plugins:rest-handler on Windows Delete typos in SAML docs (#31199) Ensure we don't use a remote profile if cluster name matches (#31331) Test: Skip alias tests that failed all weekend [DOCS] Fix version in SQL JDBC Maven template [DOCS] Improve install and setup section for SQL JDBC Add ingest-attachment support for per document `indexed_chars` limit (#31352) SQL: Fix rest endpoint names in node stats (#31371) [DOCS] Fixes small issue in release notes Support for remote path in reindex api Closes #22913 [ML] Put ML filter API response should contain the filter (#31362) Remove trial status info from start trial doc (#31365) [DOCS] Added links in breaking changes pages [DOCS] Adds links to release notes and highlights Docs: Document changes in rest client QA: Fix tribe tests to use node selector REST Client: NodeSelector for node attributes (#31296) LLClient: Fix assertion on windows LLClient: Support host selection (#30523) Add QA project and fixture based test for discovery-ec2 plugin (#31107) [ML] Hold ML filter items in sorted set (#31338) [ML] Add description to ML filters (#31330)
* master: Add get stored script and delete stored script to high level REST API - post backport fix Add get stored script and delete stored script to high level REST API (#31355) Core: Combine Action and GenericAction (#31405) Fix reference to XContentBuilder.string() (#31337) Avoid sending duplicate remote failed shard requests (#31313) Fix defaults in GeoShapeFieldMapper output (#31302) RestAPI: Reject forcemerge requests with a body (#30792) Packaging: Remove windows bin files from the tar distribution (#30596) Docs: Use the default distribution to test docs (#31251) [DOCS] Adds testing for security APIs (#31345) Clarify that IP range data can be specified in CIDR notation. (#31374) Use system context for cluster state update tasks (#31241) Percentile/Ranks should return null instead of NaN when empty (#30460) REST high-level client: add validate query API (#31077) Move language analyzers from server to analysis-common module. (#31300) [Test] Fix :example-plugins:rest-handler on Windows Expose lucene's RemoveDuplicatesTokenFilter (#31275) Reload secure settings for plugins (#31383) Remove some cases in FieldTypeLookupTests that are no longer relevant. (#31381) Ensure we don't use a remote profile if cluster name matches (#31331) [TEST] Double write alias fault (#30942) [DOCS] Fix version in SQL JDBC Maven template [DOCS] Improve install and setup section for SQL JDBC SQL: Fix rest endpoint names in node stats (#31371) Support for remote path in reindex api - post backport fix Closes #22913 [ML] Put ML filter API response should contain the filter (#31362) Support for remote path in reindex api (#31290) Add byte array pooling to nio http transport (#31349) Remove trial status info from start trial doc (#31365) [DOCS] Adds links to release notes and highlights add is-write-index flag to aliases (#30942) Add rollover-creation-date setting to rolled over index (#31144) [ML] Hold ML filter items in sorted set (#31338) [Tests] Fix edge case in ScriptedMetricAggregatorTests (#31357)
If we are running into a race condition between a node being configured
to be a remote node for cross cluster search etc. and that node joining
the cluster we might connect to that node with a remote profile. If that
node now joins the cluster it connected to it as a CCS remote node we use
the wrong profile and can't use bulk connections etc. anymore. This change
uses the remote profile only if we connect to a node that has a different cluster
name than the local cluster. This is not a perfect fix for this situation but
is the safe option while potentially only loose a small optimization of using
less connections per node which is small anyways since we only connect to a
small set of nodes.
Closes #29321