-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log when probe succeeds but full connection fails #51304
Log when probe succeeds but full connection fails #51304
Conversation
It is permitted for nodes to accept transport connections at addresses other than their publish address, which allows a good deal of flexibility when configuring discovery. However, it is not unusual for users to misconfigure nodes to pick a publish address which is inaccessible to other nodes. We see this happen a lot if the nodes are on different networks separated by a proxy, or if the nodes are running in Docker with the wrong kind of network config. In this case we offer no useful feedback to the user unless they enable TRACE-level logs. It's particularly tricky to diagnose because if we test connectivity between the nodes (using their discovery addresses) then all will appear well. This commit adds a WARN-level log if this kind of misconfiguration is detected: the probe connection has succeeded (to indicate that we are really talking to a healthy Elasticsearch node) but the followup connection attempt fails. It also tidies up some loose ends in `HandshakingTransportAddressConnector`, removing some TODOs that need not be completed, and registering its accidentally-unregistered timeout settings.
Pinging @elastic/es-distributed (:Distributed/Network) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment/question, looks fine in general :)
logger.trace("[{}] full connection successful: {}", thisConnectionAttempt, remoteNode); | ||
listener.onResponse(remoteNode); | ||
})); | ||
transportService.connectToNode(remoteNode, ActionListener.wrap(ignored -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move to wrap
here? We (mostly Henning :)) are currently trying to remove the number of instances of passing broken listeners to transport APIs that don't handle their own exceptions and this seems like a step in the wrong direction. Can we fix the listener to handle its exception instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It is permitted for nodes to accept transport connections at addresses other than their publish address, which allows a good deal of flexibility when configuring discovery. However, it is not unusual for users to misconfigure nodes to pick a publish address which is inaccessible to other nodes. We see this happen a lot if the nodes are on different networks separated by a proxy, or if the nodes are running in Docker with the wrong kind of network config. In this case we offer no useful feedback to the user unless they enable TRACE-level logs. It's particularly tricky to diagnose because if we test connectivity between the nodes (using their discovery addresses) then all will appear well. This commit adds a WARN-level log if this kind of misconfiguration is detected: the probe connection has succeeded (to indicate that we are really talking to a healthy Elasticsearch node) but the followup connection attempt fails. It also tidies up some loose ends in `HandshakingTransportAddressConnector`, removing some TODOs that need not be completed, and registering its accidentally-unregistered timeout settings.
It is permitted for nodes to accept transport connections at addresses other than their publish address, which allows a good deal of flexibility when configuring discovery. However, it is not unusual for users to misconfigure nodes to pick a publish address which is inaccessible to other nodes. We see this happen a lot if the nodes are on different networks separated by a proxy, or if the nodes are running in Docker with the wrong kind of network config. In this case we offer no useful feedback to the user unless they enable TRACE-level logs. It's particularly tricky to diagnose because if we test connectivity between the nodes (using their discovery addresses) then all will appear well. This commit adds a WARN-level log if this kind of misconfiguration is detected: the probe connection has succeeded (to indicate that we are really talking to a healthy Elasticsearch node) but the followup connection attempt fails. It also tidies up some loose ends in `HandshakingTransportAddressConnector`, removing some TODOs that need not be completed, and registering its accidentally-unregistered timeout settings.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (elastic#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
The following settings are not exposed to users in 7.6 and earlier: - `discovery.probe.connect_timeout` - `discovery.probe.handshake_timeout` This was addressed in 7.7 (#51304) but the docs for older versions suggest incorrectly that these settings are available. This commit removes the docs for these settings in the affected versions to avoid confusion.
It is permitted for nodes to accept transport connections at addresses other
than their publish address, which allows a good deal of flexibility when
configuring discovery. However, it is not unusual for users to misconfigure
nodes to pick a publish address which is inaccessible to other nodes. We see
this happen a lot if the nodes are on different networks separated by a proxy,
or if the nodes are running in Docker with the wrong kind of network config.
In this case we offer no useful feedback to the user unless they enable
TRACE-level logs. It's particularly tricky to diagnose because if we test
connectivity between the nodes (using their discovery addresses) then all will
appear well.
This commit adds a WARN-level log if this kind of misconfiguration is detected:
the probe connection has succeeded (to indicate that we are really talking to a
healthy Elasticsearch node) but the followup connection attempt fails.
It also tidies up some loose ends in
HandshakingTransportAddressConnector
,removing some TODOs that need not be completed, and registering its
accidentally-unregistered timeout settings.