-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind to non-localhost for transport in some cases #82973
Bind to non-localhost for transport in some cases #82973
Conversation
When enrolling a new node to an existing cluster, we sometimes need to bind transport to non-localhost addresse. If the other nodes of the cluster are on different hosts than this node, then the default configuration of binding transport layer to localhost will prevent this node to join the cluster even after "successful" enrollment. We check the non-localhost transport addresses that we receive during enrollment and if any of these are not in the list of non-localhost IP addresses that we gather from all interfaces of the current host, we assume that at least some other node in the cluster runs on another host. - If we don't find any non localhost addresses on all interfaces of the current host, we don't attempt to bind to non localhost addresses as this would result on an error on startup. - If the transport layer addresses we found out in enrollment are all localhost, we cannot be sure where we are still on the same host, but we assume that as it is safer to do so and do not bind to non localhost for this node either.
… when enrolling, not in an initial node
Pinging @elastic/es-security (Team:Security) |
…st-on-enrolling-nodes
…st-on-enrolling-nodes
boolean inEnrollmentMode, | ||
List<String> transportStringAddresses, | ||
InetAddress[] localIP | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found the method just a bit hard to follow.
I believe conceptually it goes over the non-loopback node transport (publish) addresses, which it gets from the node enroll API, and checks if all of them are an addresses on the current host (Network.getAllAddresses
).
We can agree to a rewrite in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can agree to a rewrite in a follow-up.
lets do it now! This is a small enough PR, no need to do the work again in a follow up. Do you have any concrete suggestions on what to change ? Comments ? Anything you'd prefer to change in code ?
Your description is spot on:
- If this is not an enrolling node, we should only bind to localhost ( default behavior )
- We get all non loopback addresses of the current host.
a. If there are all localhost, we bind this node to localhost too for the transport layer, otherwise the node will fail to even start - We get all transport addresses we found out in the enrollement token, the URI - InetAddress.getByName dance is necessary because we have strings of
host:port
and we want to get the host
a. If any of these, is not an IP address that we have on the current host, we assume there is a node of the cluster that is not on the current host, and as such we bind transport to non-localhost so that it can talk to that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a concrete suggestion:
static boolean anyRemoteHostNodeAddress(List<String> allNodesTransportPublishAddresses, InetAddress[] allHostAddresses) {
final List<InetAddress> allAddressesList = Arrays.asList(allHostAddresses);
for (String nodeStringAddress : allNodesTransportPublishAddresses) {
try {
final URI uri = new URI("http://" + nodeStringAddress);
final InetAddress nodeAddress = InetAddress.getByName(uri.getHost());
if (false == nodeAddress.isLoopbackAddress() && false == allAddressesList.contains(nodeAddress)) {
// this node's address is on a remote host
return true;
}
} catch (URISyntaxException | UnknownHostException e) {
// I don't know what to do with invalid ip addresses here. Ignoring
}
}
return false;
}
I think this captures the point 3 above. We do the String->InetAddress conversion in multiple places, and would be reasonable to factor out, but we don't have to deal with that now.
Point 2. I think is covered by the hostSettingValue
method, which will avoid to bind to _site_
in any case, if the host doesn't have "site" addresses configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point 2. I think is covered by the hostSettingValue method, which will avoid to bind to site in any case, if the host doesn't have "site" addresses configured.
good point!
Here is a concrete suggestion:
Sounds good to me, I incorporated your suggestion
@@ -325,7 +325,7 @@ private static void outputInformationToConsole( | |||
+ ", using the enrollment token that you generated." | |||
); | |||
} else if (Strings.isEmpty(nodeEnrollmentToken)) { | |||
builder.append(infoBullet + " Configure other nodes to join this cluster:"); | |||
builder.append(infoBullet + " Configure other nodes to join this cluster:"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you're tackling #82740 (comment) .
If it is important to you, take care to align the leading spacing of subsequent rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is important to you, take care to align the leading spacing of subsequent rows.
I don't think that matters much ( in my opinion at least ). I just wanted to make sure that the words do not collide with the unicode symbols ( not sure if we can fix that universally either but I thought I'd try and we can always iterate based on feedback )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @jkakavas, I've created a changelog YAML for you. |
When enrolling a new node to an existing cluster, we sometimes need to bind transport to non-localhost addresse. If the other nodes of the cluster are on different hosts than this node, then the default configuration of binding transport layer to localhost will prevent this node to join the cluster even after "successful" enrollment. We check the non-localhost transport addresses that we receive during enrollment and if any of these are not in the list of non-localhost IP addresses that we gather from all interfaces of the current host, we assume that at least some other node in the cluster runs on another host.
💚 Backport successful
|
When enrolling a new node to an existing cluster, we sometimes need to bind transport to non-localhost addresse. If the other nodes of the cluster are on different hosts than this node, then the default configuration of binding transport layer to localhost will prevent this node to join the cluster even after "successful" enrollment. We check the non-localhost transport addresses that we receive during enrollment and if any of these are not in the list of non-localhost IP addresses that we gather from all interfaces of the current host, we assume that at least some other node in the cluster runs on another host.
* upstream/master: (762 commits) [DOCS] Add note to that log4j customization is outside the support scope (elastic#82668) Batch Index Settings Update Requests (elastic#82896) [DOCS] Delete pipeline containing stored script (elastic#83102) Try again to fix changelog areas after reorg (elastic#83100) Bind to non-localhost for transport in some cases (elastic#82973) [DOCS] Reuse multi-level `join` warning (elastic#82976) Remove unnecessary CopyOnWriteHashMap class (elastic#83040) Adjust changelog categories after reorg (elastic#83087) [DOCS] Fix typo in `action.destructive_requires_name` breaking change (elastic#83085) Stack Monitoring: Add Enterprise Search monitoring index templates (elastic#82743) [DOCS] Fix stored script example snippet (elastic#83056) [DOCS] Re-add network traffic para to `term` query (elastic#83047) [DOCS] Rename example stored script (elastic#83054) [ML][DOCS] Add Trained model APIs to the REST APIs index (elastic#82791) [ML] Update running process when global calendar changes (elastic#83044) [Transform] Fix condition on which the transform stops processing buckets (elastic#82852) [DOCS] Fixes field names in ML sum functions. (elastic#83048) [ML] fix NLP tokenization never_split handling around punctuation (elastic#82982) Construct dynamic updates directly via object builders (elastic#81449) Emit trace.id into audit logs (elastic#82849) ... # Conflicts: # client/rest-high-level/src/test/java/org/elasticsearch/client/IndicesClientIT.java # client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/ILMDocumentationIT.java # server/src/main/java/org/elasticsearch/action/admin/indices/rollover/Condition.java # server/src/test/java/org/elasticsearch/action/admin/indices/rollover/ConditionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/RolloverActionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStepTests.java
When enrolling a new node to an existing cluster, we sometimes need to
bind transport to non-localhost addresse. If the other nodes of the
cluster are on different hosts than this node, then the default
configuration of binding transport layer to localhost will prevent this
node to join the cluster even after "successful" enrollment.
We check the non-localhost transport addresses that we receive during
enrollment and if any of these are not in the list of non-localhost IP
addresses that we gather from all interfaces of the current host, we
assume that at least some other node in the cluster runs on another host.
current host, we don't attempt to bind to non localhost addresses as this
would result on an error on startup.
localhost, we cannot be sure where we are still on the same host, but we
assume that as it is safer to do so and do not bind to non localhost for
this node either.